iamtarun/code_contest_processed
收藏Hugging Face2023-07-27 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/iamtarun/code_contest_processed
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: string
- name: description
dtype: string
- name: code
dtype: string
- name: language
dtype:
class_label:
names:
'0': UNKNOWN
'1': Python2
'2': C++
'3': Python3
'4': JAVA
- name: test_samples
sequence:
- name: input
dtype: string
- name: output
dtype: string
- name: source
dtype:
class_label:
names:
'0': UNKNOWN_SOURCE
'1': CODECHEF
'2': CODEFORCES
'3': HACKEREARTH
'4': CODEJAM
'5': ATCODER
'6': AIZU
splits:
- name: train
num_bytes: 3321514817
num_examples: 38438
- name: valid
num_bytes: 122746000
num_examples: 396
- name: test
num_bytes: 77106001
num_examples: 514
download_size: 1047406436
dataset_size: 3521366818
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: valid
path: data/valid-*
- split: test
path: data/test-*
task_categories:
- text-generation
- text2text-generation
- question-answering
tags:
- code
size_categories:
- 10K<n<100K
---
# Dataset Card for Code Contest Processed
## Dataset Summary
This dataset is created by processing [code_contest dataset from Deepmind](https://huggingface.co/datasets/deepmind/code_contests). It is a competitive programming dataset for machine-learning. Read more about dataset at [original source](https://huggingface.co/datasets/deepmind/code_contests).
## Columns Description
- `id` : unique string associated with a problem
- `description` : problem description
- `code` : one correct code for the problem
- `language` : programming language used for code
- `test_samples` : contains inputs and their corresponding outputs for the problem
- `source` : source of problem
提供机构:
iamtarun
原始信息汇总
数据集概述
数据集特征
- id:字符串类型,唯一标识符。
- description:字符串类型,问题描述。
- code:字符串类型,问题的正确代码。
- language:分类标签类型,编程语言,包括:
- 0: UNKNOWN
- 1: Python2
- 2: C++
- 3: Python3
- 4: JAVA
- test_samples:序列类型,包含:
- input:字符串类型,输入数据。
- output:字符串类型,输出数据。
- source:分类标签类型,问题来源,包括:
- 0: UNKNOWN_SOURCE
- 1: CODECHEF
- 2: CODEFORCES
- 3: HACKEREARTH
- 4: CODEJAM
- 5: ATCODER
- 6: AIZU
数据集划分
- train:38438个样本,大小为3321514817字节。
- valid:396个样本,大小为122746000字节。
- test:514个样本,大小为77106001字节。
数据集大小
- 下载大小:1047406436字节。
- 数据集大小:3521366818字节。
配置文件
- default:包含训练、验证和测试数据的路径配置。
任务类别
- 文本生成
- 文本到文本生成
- 问答
标签
- 代码
大小类别
- 10K<n<100K



