manu/code_5p_data_separate
收藏Hugging Face2023-09-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/manu/code_5p_data_separate
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: StarcoderdataPythonTrain
path: data/StarcoderdataPythonTrain-*
- split: StarcoderdataPythonTest
path: data/StarcoderdataPythonTest-*
- split: StarcoderdataMarkdownTrain
path: data/StarcoderdataMarkdownTrain-*
- split: StarcoderdataMarkdownTest
path: data/StarcoderdataMarkdownTest-*
- split: StarcoderdataJupyterScriptsDedupFilteredTrain
path: data/StarcoderdataJupyterScriptsDedupFilteredTrain-*
- split: StarcoderdataJupyterScriptsDedupFilteredTest
path: data/StarcoderdataJupyterScriptsDedupFilteredTest-*
- split: StarcoderdataJupyterStructuredCleanDedupTrain
path: data/StarcoderdataJupyterStructuredCleanDedupTrain-*
- split: StarcoderdataJupyterStructuredCleanDedupTest
path: data/StarcoderdataJupyterStructuredCleanDedupTest-*
- split: StarcoderdataJsonTrain
path: data/StarcoderdataJsonTrain-*
- split: StarcoderdataJsonTest
path: data/StarcoderdataJsonTest-*
- split: CodeContestsTrain
path: data/CodeContestsTrain-*
- split: CodeContestsTest
path: data/CodeContestsTest-*
- split: PypiCleanTrain
path: data/PypiCleanTrain-*
- split: PypiCleanTest
path: data/PypiCleanTest-*
dataset_info:
features:
- name: id
dtype: string
- name: text
dtype: string
- name: dataset_id
dtype: string
splits:
- name: StarcoderdataPythonTrain
num_bytes: 3077290405
num_examples: 643232
- name: StarcoderdataPythonTest
num_bytes: 546326
num_examples: 100
- name: StarcoderdataMarkdownTrain
num_bytes: 4054448273
num_examples: 1051364
- name: StarcoderdataMarkdownTest
num_bytes: 680799
num_examples: 100
- name: StarcoderdataJupyterScriptsDedupFilteredTrain
num_bytes: 401590417
num_examples: 45626
- name: StarcoderdataJupyterScriptsDedupFilteredTest
num_bytes: 724111
num_examples: 100
- name: StarcoderdataJupyterStructuredCleanDedupTrain
num_bytes: 316718609
num_examples: 33337
- name: StarcoderdataJupyterStructuredCleanDedupTest
num_bytes: 971655
num_examples: 100
- name: StarcoderdataJsonTrain
num_bytes: 291208312
num_examples: 237477
- name: StarcoderdataJsonTest
num_bytes: 112941
num_examples: 100
- name: CodeContestsTrain
num_bytes: 151487748
num_examples: 78717
- name: CodeContestsTest
num_bytes: 79396
num_examples: 42
- name: PypiCleanTrain
num_bytes: 1549670299
num_examples: 121809
- name: PypiCleanTest
num_bytes: 1718599
num_examples: 100
download_size: 4213817063
dataset_size: 9847247890
---
# Dataset Card for "code_5p_data_separate"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
manu
原始信息汇总
数据集概述
配置信息
- 默认配置:
- 数据文件:
StarcoderdataPythonTrain:路径为data/StarcoderdataPythonTrain-*StarcoderdataPythonTest:路径为data/StarcoderdataPythonTest-*StarcoderdataMarkdownTrain:路径为data/StarcoderdataMarkdownTrain-*StarcoderdataMarkdownTest:路径为data/StarcoderdataMarkdownTest-*StarcoderdataJupyterScriptsDedupFilteredTrain:路径为data/StarcoderdataJupyterScriptsDedupFilteredTrain-*StarcoderdataJupyterScriptsDedupFilteredTest:路径为data/StarcoderdataJupyterScriptsDedupFilteredTest-*StarcoderdataJupyterStructuredCleanDedupTrain:路径为data/StarcoderdataJupyterStructuredCleanDedupTrain-*StarcoderdataJupyterStructuredCleanDedupTest:路径为data/StarcoderdataJupyterStructuredCleanDedupTest-*StarcoderdataJsonTrain:路径为data/StarcoderdataJsonTrain-*StarcoderdataJsonTest:路径为data/StarcoderdataJsonTest-*CodeContestsTrain:路径为data/CodeContestsTrain-*CodeContestsTest:路径为data/CodeContestsTest-*PypiCleanTrain:路径为data/PypiCleanTrain-*PypiCleanTest:路径为data/PypiCleanTest-*
- 数据文件:
数据集信息
-
特征:
id:类型为字符串text:类型为字符串dataset_id:类型为字符串
-
分割:
StarcoderdataPythonTrain:字节数为 3077290405,示例数为 643232StarcoderdataPythonTest:字节数为 546326,示例数为 100StarcoderdataMarkdownTrain:字节数为 4054448273,示例数为 1051364StarcoderdataMarkdownTest:字节数为 680799,示例数为 100StarcoderdataJupyterScriptsDedupFilteredTrain:字节数为 401590417,示例数为 45626StarcoderdataJupyterScriptsDedupFilteredTest:字节数为 724111,示例数为 100StarcoderdataJupyterStructuredCleanDedupTrain:字节数为 316718609,示例数为 33337StarcoderdataJupyterStructuredCleanDedupTest:字节数为 971655,示例数为 100StarcoderdataJsonTrain:字节数为 291208312,示例数为 237477StarcoderdataJsonTest:字节数为 112941,示例数为 100CodeContestsTrain:字节数为 151487748,示例数为 78717CodeContestsTest:字节数为 79396,示例数为 42PypiCleanTrain:字节数为 1549670299,示例数为 121809PypiCleanTest:字节数为 1718599,示例数为 100
-
下载大小:4213817063 字节
-
数据集大小:9847247890 字节



