five

davidgaofc/techdebt

收藏
Hugging Face2023-12-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/davidgaofc/techdebt
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* - split: validation path: data/validation-* dataset_info: features: - name: Diff dtype: string - name: FaultInducingLabel dtype: int64 splits: - name: train num_bytes: 89390701 num_examples: 207464 - name: test num_bytes: 29611000 num_examples: 69155 - name: validation num_bytes: 29496034 num_examples: 69155 download_size: 56932761 dataset_size: 148497735 --- # Dataset Card for TechDebt This dataset was generated from [The Technical Debt Dataset](https://github.com/clowee/The-Technical-Debt-Dataset) created by Lenarduzzi, et al. and the citation is down below. ## Dataset Details and Structure The labels for the dataset were provided by the SZZ algorithm cited by the paper and matched to the diff in the commit where the technical debt was located. This diff was then cleaned to only include the lines of code added. ## Bias, Risks, and Limitations Beware of the data imbalance if you would like to use the dataset. Also, the queries used to extract this data are still being checked over to ensure correctness. ## Recommendations Changes are constantly being made to this dataset to make it better. Please be aware when you use it. ## References Valentina Lenarduzzi, Nyyti Saarimäki, Davide Taibi. The Technical Debt Dataset. Proceedings for the 15th Conference on Predictive Models and Data Analytics in Software Engineering. Brazil. 2019.
提供机构:
davidgaofc
原始信息汇总

数据集卡片 for TechDebt

数据集详情和结构

配置

  • 默认配置
    • 数据文件
      • 训练集:路径为 data/train-*
      • 测试集:路径为 data/test-*
      • 验证集:路径为 data/validation-*

数据集信息

  • 特征

    • Diff:类型为 string
    • FaultInducingLabel:类型为 int64
  • 拆分

    • 训练集
      • 字节数:89390701
      • 样本数:207464
    • 测试集
      • 字节数:29611000
      • 样本数:69155
    • 验证集
      • 字节数:29496034
      • 样本数:69155
  • 下载大小:56932761

  • 数据集大小:148497735

偏差、风险和限制

注意数据不平衡问题,提取数据的查询仍在检查中以确保正确性。

建议

数据集正在不断改进中,请在使用时注意。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作