thangvip/cosmopedia_vi_khanacademy
收藏Hugging Face2024-04-17 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/thangvip/cosmopedia_vi_khanacademy
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: prompt
dtype: string
- name: text_token_length
dtype: int64
- name: text
dtype: string
- name: seed_data
dtype: string
- name: format
dtype: string
- name: audience
dtype: string
- name: vi_text
dtype: string
splits:
- name: 0_set
num_bytes: 7768262
num_examples: 1000
- name: 1_set
num_bytes: 8315002
num_examples: 1000
- name: 2_set
num_bytes: 8679110
num_examples: 1000
- name: 3_set
num_bytes: 8989839
num_examples: 1000
- name: 4_set
num_bytes: 8970072
num_examples: 1000
- name: 5_set
num_bytes: 8759024
num_examples: 1000
- name: 6_set
num_bytes: 8894284
num_examples: 1000
- name: 7_set
num_bytes: 8417565
num_examples: 1000
- name: 8_set
num_bytes: 9219928
num_examples: 1000
- name: 9_set
num_bytes: 8654204
num_examples: 1000
- name: 10_set
num_bytes: 8389535
num_examples: 1000
- name: 11_set
num_bytes: 8782979
num_examples: 1000
- name: 12_set
num_bytes: 8329504
num_examples: 1000
- name: 13_set
num_bytes: 8262062
num_examples: 1000
- name: 14_set
num_bytes: 7740216
num_examples: 1000
- name: 15_set
num_bytes: 8613023
num_examples: 1000
- name: 16_set
num_bytes: 8537311
num_examples: 1000
- name: 17_set
num_bytes: 8736438
num_examples: 1000
- name: 18_set
num_bytes: 9339629
num_examples: 1000
- name: 19_set
num_bytes: 9512623
num_examples: 1000
- name: 20_set
num_bytes: 9208246
num_examples: 1000
- name: 21_set
num_bytes: 9601798
num_examples: 1000
- name: 22_set
num_bytes: 9641493
num_examples: 1000
- name: 23_set
num_bytes: 9464960
num_examples: 1000
download_size: 87231350
dataset_size: 210827107
configs:
- config_name: default
data_files:
- split: 0_set
path: data/0_set-*
- split: 1_set
path: data/1_set-*
- split: 2_set
path: data/2_set-*
- split: 3_set
path: data/3_set-*
- split: 4_set
path: data/4_set-*
- split: 5_set
path: data/5_set-*
- split: 6_set
path: data/6_set-*
- split: 7_set
path: data/7_set-*
- split: 8_set
path: data/8_set-*
- split: 9_set
path: data/9_set-*
- split: 10_set
path: data/10_set-*
- split: 11_set
path: data/11_set-*
- split: 12_set
path: data/12_set-*
- split: 13_set
path: data/13_set-*
- split: 14_set
path: data/14_set-*
- split: 15_set
path: data/15_set-*
- split: 16_set
path: data/16_set-*
- split: 17_set
path: data/17_set-*
- split: 18_set
path: data/18_set-*
- split: 19_set
path: data/19_set-*
- split: 20_set
path: data/20_set-*
- split: 21_set
path: data/21_set-*
- split: 22_set
path: data/22_set-*
- split: 23_set
path: data/23_set-*
---
提供机构:
thangvip
原始信息汇总
数据集概述
数据集特征
- prompt: 数据类型为字符串
- text_token_length: 数据类型为整数
- text: 数据类型为字符串
- seed_data: 数据类型为字符串
- format: 数据类型为字符串
- audience: 数据类型为字符串
- vi_text: 数据类型为字符串
数据集分割
- 0_set 至 23_set: 每个分割包含1000个示例,具体大小如下:
- 0_set: 7768262字节
- 1_set: 8315002字节
- 2_set: 8679110字节
- 3_set: 8989839字节
- 4_set: 8970072字节
- 5_set: 8759024字节
- 6_set: 8894284字节
- 7_set: 8417565字节
- 8_set: 9219928字节
- 9_set: 8654204字节
- 10_set: 8389535字节
- 11_set: 8782979字节
- 12_set: 8329504字节
- 13_set: 8262062字节
- 14_set: 7740216字节
- 15_set: 8613023字节
- 16_set: 8537311字节
- 17_set: 8736438字节
- 18_set: 9339629字节
- 19_set: 9512623字节
- 20_set: 9208246字节
- 21_set: 9601798字节
- 22_set: 9641493字节
- 23_set: 9464960字节
数据集大小
- 下载大小: 87231350字节
- 数据集总大小: 210827107字节
配置
- 默认配置: 包含24个分割的数据文件路径配置



