five

stojchet/csn_filtered_subset

收藏
Hugging Face2024-06-02 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/stojchet/csn_filtered_subset
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: go features: - name: repository_name dtype: string - name: func_path_in_repository dtype: string - name: func_name dtype: string - name: whole_func_string dtype: string - name: language dtype: string - name: func_code_string dtype: string - name: func_code_tokens sequence: string - name: func_documentation_string dtype: string - name: func_documentation_tokens sequence: string - name: split_name dtype: string - name: func_code_url dtype: string splits: - name: train num_bytes: 877763487.7648174 num_examples: 282184 - name: validation num_bytes: 39170182.18101263 num_examples: 12719 - name: test num_bytes: 34052630.992171414 num_examples: 11092 download_size: 210439217 dataset_size: 950986300.9380014 - config_name: java features: - name: repository_name dtype: string - name: func_path_in_repository dtype: string - name: func_name dtype: string - name: whole_func_string dtype: string - name: language dtype: string - name: func_code_string dtype: string - name: func_code_tokens sequence: string - name: func_documentation_string dtype: string - name: func_documentation_tokens sequence: string - name: split_name dtype: string - name: func_code_url dtype: string splits: - name: train num_bytes: 1131253062.4569702 num_examples: 363676 - name: validation num_bytes: 38123098.138120554 num_examples: 12379 - name: test num_bytes: 66732964.28748918 num_examples: 21737 download_size: 356500962 dataset_size: 1236109124.88258 - config_name: javascript features: - name: repository_name dtype: string - name: func_path_in_repository dtype: string - name: func_name dtype: string - name: whole_func_string dtype: string - name: language dtype: string - name: func_code_string dtype: string - name: func_code_tokens sequence: string - name: func_documentation_string dtype: string - name: func_documentation_tokens sequence: string - name: split_name dtype: string - name: func_code_url dtype: string splits: - name: train num_bytes: 292440593.8633003 num_examples: 94014 - name: validation num_bytes: 18410201.201202415 num_examples: 5978 - name: test num_bytes: 15181235.147519622 num_examples: 4945 download_size: 140053671 dataset_size: 326032030.21202236 - config_name: php features: - name: repository_name dtype: string - name: func_path_in_repository dtype: string - name: func_name dtype: string - name: whole_func_string dtype: string - name: language dtype: string - name: func_code_string dtype: string - name: func_code_tokens sequence: string - name: func_documentation_string dtype: string - name: func_documentation_tokens sequence: string - name: split_name dtype: string - name: func_code_url dtype: string splits: - name: train num_bytes: 1188463343.9292386 num_examples: 382068 - name: validation num_bytes: 56548697.63407138 num_examples: 18362 - name: test num_bytes: 61572265.34249818 num_examples: 20056 download_size: 390937771 dataset_size: 1306584306.905808 - config_name: python features: - name: repository_name dtype: string - name: func_path_in_repository dtype: string - name: func_name dtype: string - name: whole_func_string dtype: string - name: language dtype: string - name: func_code_string dtype: string - name: func_code_tokens sequence: string - name: func_documentation_string dtype: string - name: func_documentation_tokens sequence: string - name: split_name dtype: string - name: func_code_url dtype: string splits: - name: train num_bytes: 1093813798.2225087 num_examples: 351640 - name: validation num_bytes: 61670170.4673935 num_examples: 20025 - name: test num_bytes: 57998765.299684666 num_examples: 18892 download_size: 536239180 dataset_size: 1213482733.9895868 - config_name: ruby features: - name: repository_name dtype: string - name: func_path_in_repository dtype: string - name: func_name dtype: string - name: whole_func_string dtype: string - name: language dtype: string - name: func_code_string dtype: string - name: func_code_tokens sequence: string - name: func_documentation_string dtype: string - name: func_documentation_tokens sequence: string - name: split_name dtype: string - name: func_code_url dtype: string splits: - name: train num_bytes: 131009429.35882549 num_examples: 42117 - name: validation num_bytes: 5820555.414900061 num_examples: 1890 - name: test num_bytes: 5679531.854987118 num_examples: 1850 download_size: 37336246 dataset_size: 142509516.62871265 configs: - config_name: go data_files: - split: train path: go/train-* - split: validation path: go/validation-* - split: test path: go/test-* - config_name: java data_files: - split: train path: java/train-* - split: validation path: java/validation-* - split: test path: java/test-* - config_name: javascript data_files: - split: train path: javascript/train-* - split: validation path: javascript/validation-* - split: test path: javascript/test-* - config_name: php data_files: - split: train path: php/train-* - split: validation path: php/validation-* - split: test path: php/test-* - config_name: python data_files: - split: train path: python/train-* - split: validation path: python/validation-* - split: test path: python/test-* - config_name: ruby data_files: - split: train path: ruby/train-* - split: validation path: ruby/validation-* - split: test path: ruby/test-* ---
提供机构:
stojchet
原始信息汇总

数据集概述

数据集配置及特征

  1. Go 语言数据集

    • 特征:
      • repository_name: 字符串
      • func_path_in_repository: 字符串
      • func_name: 字符串
      • whole_func_string: 字符串
      • language: 字符串
      • func_code_string: 字符串
      • func_code_tokens: 序列,字符串
      • func_documentation_string: 字符串
      • func_documentation_tokens: 序列,字符串
      • split_name: 字符串
      • func_code_url: 字符串
    • 分割:
      • 训练集: 282184 个示例,877763487.7648174 字节
      • 验证集: 12719 个示例,39170182.18101263 字节
      • 测试集: 11092 个示例,34052630.992171414 字节
    • 下载大小: 210439217 字节
    • 数据集大小: 950986300.9380014 字节
  2. Java 语言数据集

    • 特征: 同上
    • 分割:
      • 训练集: 363676 个示例,1131253062.4569702 字节
      • 验证集: 12379 个示例,38123098.138120554 字节
      • 测试集: 21737 个示例,66732964.28748918 字节
    • 下载大小: 356500962 字节
    • 数据集大小: 1236109124.88258 字节
  3. JavaScript 语言数据集

    • 特征: 同上
    • 分割:
      • 训练集: 94014 个示例,292440593.8633003 字节
      • 验证集: 5978 个示例,18410201.201202415 字节
      • 测试集: 4945 个示例,15181235.147519622 字节
    • 下载大小: 140053671 字节
    • 数据集大小: 326032030.21202236 字节
  4. PHP 语言数据集

    • 特征: 同上
    • 分割:
      • 训练集: 382068 个示例,1188463343.9292386 字节
      • 验证集: 18362 个示例,56548697.63407138 字节
      • 测试集: 20056 个示例,61572265.34249818 字节
    • 下载大小: 390937771 字节
    • 数据集大小: 1306584306.905808 字节
  5. Python 语言数据集

    • 特征: 同上
    • 分割:
      • 训练集: 351640 个示例,1093813798.2225087 字节
      • 验证集: 20025 个示例,61670170.4673935 字节
      • 测试集: 18892 个示例,57998765.299684666 字节
    • 下载大小: 536239180 字节
    • 数据集大小: 1213482733.9895868 字节
  6. Ruby 语言数据集

    • 特征: 同上
    • 分割:
      • 训练集: 42117 个示例,131009429.35882549 字节
      • 验证集: 1890 个示例,5820555.414900061 字节
      • 测试集: 1850 个示例,5679531.854987118 字节
    • 下载大小: 37336246 字节
    • 数据集大小: 142509516.62871265 字节

数据文件路径

  • Go:

    • 训练集: go/train-*
    • 验证集: go/validation-*
    • 测试集: go/test-*
  • Java:

    • 训练集: java/train-*
    • 验证集: java/validation-*
    • 测试集: java/test-*
  • JavaScript:

    • 训练集: javascript/train-*
    • 验证集: javascript/validation-*
    • 测试集: javascript/test-*
  • PHP:

    • 训练集: php/train-*
    • 验证集: php/validation-*
    • 测试集: php/test-*
  • Python:

    • 训练集: python/train-*
    • 验证集: python/validation-*
    • 测试集: python/test-*
  • Ruby:

    • 训练集: ruby/train-*
    • 验证集: ruby/validation-*
    • 测试集: ruby/test-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作