iamtarun/code_contest_processed

Name: iamtarun/code_contest_processed
Creator: iamtarun
Published: 2023-07-27 15:40:46
License: 暂无描述

Hugging Face2023-07-27 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/iamtarun/code_contest_processed

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: id dtype: string - name: description dtype: string - name: code dtype: string - name: language dtype: class_label: names: '0': UNKNOWN '1': Python2 '2': C++ '3': Python3 '4': JAVA - name: test_samples sequence: - name: input dtype: string - name: output dtype: string - name: source dtype: class_label: names: '0': UNKNOWN_SOURCE '1': CODECHEF '2': CODEFORCES '3': HACKEREARTH '4': CODEJAM '5': ATCODER '6': AIZU splits: - name: train num_bytes: 3321514817 num_examples: 38438 - name: valid num_bytes: 122746000 num_examples: 396 - name: test num_bytes: 77106001 num_examples: 514 download_size: 1047406436 dataset_size: 3521366818 configs: - config_name: default data_files: - split: train path: data/train-* - split: valid path: data/valid-* - split: test path: data/test-* task_categories: - text-generation - text2text-generation - question-answering tags: - code size_categories: - 10K<n<100K --- # Dataset Card for Code Contest Processed ## Dataset Summary This dataset is created by processing [code_contest dataset from Deepmind](https://huggingface.co/datasets/deepmind/code_contests). It is a competitive programming dataset for machine-learning. Read more about dataset at [original source](https://huggingface.co/datasets/deepmind/code_contests). ## Columns Description - `id` : unique string associated with a problem - `description` : problem description - `code` : one correct code for the problem - `language` : programming language used for code - `test_samples` : contains inputs and their corresponding outputs for the problem - `source` : source of problem

提供机构：

iamtarun

原始信息汇总

数据集概述

数据集特征

id：字符串类型，唯一标识符。
description：字符串类型，问题描述。
code：字符串类型，问题的正确代码。
language：分类标签类型，编程语言，包括：
- 0: UNKNOWN
- 1: Python2
- 2: C++
- 3: Python3
- 4: JAVA
test_samples：序列类型，包含：
- input：字符串类型，输入数据。
- output：字符串类型，输出数据。
source：分类标签类型，问题来源，包括：
- 0: UNKNOWN_SOURCE
- 1: CODECHEF
- 2: CODEFORCES
- 3: HACKEREARTH
- 4: CODEJAM
- 5: ATCODER
- 6: AIZU

数据集划分

train：38438个样本，大小为3321514817字节。
valid：396个样本，大小为122746000字节。
test：514个样本，大小为77106001字节。

数据集大小

下载大小：1047406436字节。
数据集大小：3521366818字节。

配置文件

default：包含训练、验证和测试数据的路径配置。

任务类别

文本生成
文本到文本生成
问答

大小类别

10K<n<100K

5,000+

优质数据集

54 个

任务类型

进入经典数据集