stefanocarrera/autophagycode_D_he_train-mercury_Qwen3-0.6B_strategy_trust_t1_g2_metrics
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/stefanocarrera/autophagycode_D_he_train-mercury_Qwen3-0.6B_strategy_trust_t1_g2_metrics
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个用于代码分析和软件度量评估的数据集,包含164个样本,仅提供训练集。每个样本代表一个代码任务,具有任务ID和入口点信息,并标注了代码的可执行性、正确性、测试通过/失败数量、测试运行时间(可能为空)和错误类型。数据集还提供了多种软件度量指标,包括Halstead度量(如词汇量、长度、体积、难度、工作量和时间)、圈复杂度、可维护性指数、代码行数(LOC)、有效代码行数(SLOC)、注释百分比、TTR(类型标记比)、token字典、香农熵、平均和最大预测熵、定义函数数量以及入口点是否重复。这些特征可用于代码质量评估、缺陷预测或机器学习任务。
This dataset is designed for code analysis and software metrics evaluation, containing 164 samples with only a training split. Each sample represents a code task, featuring task ID and entry point information, along with annotations for code executability, correctness, number of tests passed/failed, test run time (possibly null), and error type. The dataset includes various software metrics such as Halstead measures (e.g., vocabulary, length, volume, difficulty, effort, and time), cyclomatic complexity, maintainability index, lines of code (LOC), source lines of code (SLOC), comment percentage, TTR (Type-Token Ratio), token dictionary, Shannon entropy, mean and max predictive entropy, number of functions defined, and whether the entry point is repeated. These features can be used for code quality assessment, defect prediction, or machine learning tasks.
提供机构:
stefanocarrera



