five

xszheng2020/the_stack_dedup_python_hits_1

收藏
Hugging Face2025-09-20 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/xszheng2020/the_stack_dedup_python_hits_1
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了文件的多种特征,如哈希值、大小、扩展名和编程语言等。同时,还包括了代码所在仓库的星标数、问题数、分支数及其相关的时间戳信息。此外,数据集还提供了丰富的代码质量信号,如平均行长度、字符数、唯一词比例等,以及针对Python代码的特定质量信号。数据集被划分为训练集,并提供了详细的大小和示例数信息。

The dataset includes various file features such as hash values, size, extension, and programming language. It also contains repository-related information like star count, issue count, fork count, and their corresponding timestamp information. Additionally, the dataset provides a rich set of code quality signals, including average line length, number of characters, unique word ratio, and specific quality signals for Python code. The dataset is split into a training set with detailed information on size and number of examples.
提供机构:
xszheng2020
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作