systemk/codenet

Name: systemk/codenet
Creator: systemk
Published: 2024-02-16 05:11:12
License: 暂无描述

Hugging Face2024-02-16 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/systemk/codenet

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cdla-permissive-2.0 task_categories: - text-generation - text-classification dataset_info: - config_name: accepted features: - name: code dtype: string - name: submission_id dtype: string - name: problem_id dtype: string - name: user_id dtype: string - name: date dtype: string - name: language dtype: class_label: names: '0': C++ '1': C '2': Java '3': Python '4': Go '5': Ruby '6': C# '7': OCaml '8': Rust '9': JavaScript '10': PHP '11': Scala '12': Other - name: original_language dtype: string - name: filename_ext dtype: string - name: status dtype: class_label: names: '0': Accepted '1': Compile Error '2': Runtime Error '3': Time Limit Exceeded '4': Memory Limit Exceeded '5': Wrong Answer '6': Other - name: cpu_time dtype: int32 - name: memory dtype: int32 - name: code_size dtype: int32 - name: accuracy dtype: string splits: - name: train num_bytes: 7182604796.348241 num_examples: 5004663 - name: validation num_bytes: 1886090799.288453 num_examples: 1500962 - name: test num_bytes: 1350931397.6121755 num_examples: 954963 download_size: 4841625499 dataset_size: 10419626993.248869 - config_name: default features: - name: code dtype: string - name: submission_id dtype: string - name: problem_id dtype: string - name: user_id dtype: string - name: date dtype: string - name: language dtype: class_label: names: '0': C++ '1': C '2': Java '3': Python '4': Go '5': Ruby '6': C# '7': OCaml '8': Rust '9': JavaScript '10': PHP '11': Scala '12': Other - name: original_language dtype: string - name: filename_ext dtype: string - name: status dtype: class_label: names: '0': Accepted '1': Compile Error '2': Runtime Error '3': Time Limit Exceeded '4': Memory Limit Exceeded '5': Wrong Answer '6': Other - name: cpu_time dtype: int32 - name: memory dtype: int32 - name: code_size dtype: int32 - name: accuracy dtype: string splits: - name: train num_bytes: 13719235606 num_examples: 9559227 - name: validation num_bytes: 3300894506 num_examples: 2626871 - name: test num_bytes: 2448421072 num_examples: 1730770 download_size: 7476817454 dataset_size: 19468551184 - config_name: mini features: - name: code dtype: string - name: submission_id dtype: string - name: problem_id dtype: string - name: user_id dtype: string - name: date dtype: string - name: language dtype: class_label: names: '0': C++ '1': C '2': Java '3': Python '4': Go '5': Ruby '6': C# '7': OCaml '8': Rust '9': JavaScript '10': PHP '11': Scala '12': Other - name: original_language dtype: string - name: filename_ext dtype: string - name: status dtype: class_label: names: '0': Accepted '1': Compile Error '2': Runtime Error '3': Time Limit Exceeded '4': Memory Limit Exceeded '5': Wrong Answer '6': Other - name: cpu_time dtype: int32 - name: memory dtype: int32 - name: code_size dtype: int32 - name: accuracy dtype: string splits: - name: train num_bytes: 2821205 num_examples: 5399 - name: validation num_bytes: 1108361 num_examples: 1200 - name: test num_bytes: 1426005 num_examples: 2225 download_size: 1913743 dataset_size: 5355571 configs: - config_name: accepted data_files: - split: train path: accepted/train-* - split: validation path: accepted/validation-* - split: test path: accepted/test-* - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* - config_name: mini data_files: - split: train path: mini/train-* - split: validation path: mini/validation-* - split: test path: mini/test-* tags: - code --- # Dataset Card for Dataset Name  This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1). ## Dataset Details ### Dataset Description  - **Curated by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] ### Dataset Sources [optional]  - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses  ### Direct Use  [More Information Needed] ### Out-of-Scope Use  [More Information Needed] ## Dataset Structure  [More Information Needed] ## Dataset Creation ### Curation Rationale  [More Information Needed] ### Source Data  #### Data Collection and Processing  [More Information Needed] #### Who are the source data producers?  [More Information Needed] ### Annotations [optional]  #### Annotation process  [More Information Needed] #### Who are the annotators?  [More Information Needed] #### Personal and Sensitive Information  [More Information Needed] ## Bias, Risks, and Limitations  [More Information Needed] ### Recommendations  Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. ## Citation [optional]  **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional]  [More Information Needed] ## More Information [optional] [More Information Needed] ## Dataset Card Authors [optional] [More Information Needed] ## Dataset Card Contact [More Information Needed]

提供机构：

systemk

原始信息汇总

数据集概述

数据集详情

数据集描述

该数据集包含多个配置，每个配置具有不同的特征和数据分割。以下是各配置的详细信息：

配置 `accepted`

特征:
- code: 字符串
- submission_id: 字符串
- problem_id: 字符串
- user_id: 字符串
- date: 字符串
- language: 类别标签，包括 C++、C、Java、Python、Go、Ruby、C#、OCaml、Rust、JavaScript、PHP、Scala、Other
- original_language: 字符串
- filename_ext: 字符串
- status: 类别标签，包括 Accepted、Compile Error、Runtime Error、Time Limit Exceeded、Memory Limit Exceeded、Wrong Answer、Other
- cpu_time: 32位整数
- memory: 32位整数
- code_size: 32位整数
- accuracy: 字符串
数据分割:
- train: 7182604796.348241 字节，5004663 个样本
- validation: 1886090799.288453 字节，1500962 个样本
- test: 1350931397.6121755 字节，954963 个样本
下载大小: 4841625499 字节
数据集大小: 10419626993.248869 字节

配置 `default`

特征:
- code: 字符串
- submission_id: 字符串
- problem_id: 字符串
- user_id: 字符串
- date: 字符串
- language: 类别标签，包括 C++、C、Java、Python、Go、Ruby、C#、OCaml、Rust、JavaScript、PHP、Scala、Other
- original_language: 字符串
- filename_ext: 字符串
- status: 类别标签，包括 Accepted、Compile Error、Runtime Error、Time Limit Exceeded、Memory Limit Exceeded、Wrong Answer、Other
- cpu_time: 32位整数
- memory: 32位整数
- code_size: 32位整数
- accuracy: 字符串
数据分割:
- train: 13719235606 字节，9559227 个样本
- validation: 3300894506 字节，2626871 个样本
- test: 2448421072 字节，1730770 个样本
下载大小: 7476817454 字节
数据集大小: 19468551184 字节

配置 `mini`

特征:
- code: 字符串
- submission_id: 字符串
- problem_id: 字符串
- user_id: 字符串
- date: 字符串
- language: 类别标签，包括 C++、C、Java、Python、Go、Ruby、C#、OCaml、Rust、JavaScript、PHP、Scala、Other
- original_language: 字符串
- filename_ext: 字符串
- status: 类别标签，包括 Accepted、Compile Error、Runtime Error、Time Limit Exceeded、Memory Limit Exceeded、Wrong Answer、Other
- cpu_time: 32位整数
- memory: 32位整数
- code_size: 32位整数
- accuracy: 字符串
数据分割:
- train: 2821205 字节，5399 个样本
- validation: 1108361 字节，1200 个样本
- test: 1426005 字节，2225 个样本
下载大小: 1913743 字节
数据集大小: 5355571 字节

数据集配置

配置 accepted:
- 数据文件路径:
  - train: accepted/train-*
  - validation: accepted/validation-*
  - test: accepted/test-*
配置 default:
- 数据文件路径:
  - train: data/train-*
  - validation: data/validation-*
  - test: data/test-*
配置 mini:
- 数据文件路径:
  - train: mini/train-*
  - validation: mini/validation-*
  - test: mini/test-*