xszheng2020/the_stack_dedup_python
收藏Hugging Face2025-09-19 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/xszheng2020/the_stack_dedup_python
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含代码库统计信息和代码质量信号的机器学习数据集。数据集中的特征包括文件名、大小、扩展名、代码语言、代码库星标数、问题数、分支数等,以及一系列用于评估代码质量的指标,如代码行数、字符数、单词数、重复度、注释比例等。数据集适用于机器学习任务,如代码质量评估、代码库分析等。
This is a machine learning dataset containing repository statistics and code quality signals. The features include file name, size, extension, programming language, repository star count, issue count, fork count, and a series of metrics for evaluating code quality such as the number of lines of code, number of characters, number of words, duplication rate, comment ratio, etc. The dataset is suitable for machine learning tasks like code quality assessment and repository analysis.
提供机构:
xszheng2020



