five

thangvip/cosmopedia_vi_khanacademy

收藏
Hugging Face2024-04-17 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/thangvip/cosmopedia_vi_khanacademy
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: prompt dtype: string - name: text_token_length dtype: int64 - name: text dtype: string - name: seed_data dtype: string - name: format dtype: string - name: audience dtype: string - name: vi_text dtype: string splits: - name: 0_set num_bytes: 7768262 num_examples: 1000 - name: 1_set num_bytes: 8315002 num_examples: 1000 - name: 2_set num_bytes: 8679110 num_examples: 1000 - name: 3_set num_bytes: 8989839 num_examples: 1000 - name: 4_set num_bytes: 8970072 num_examples: 1000 - name: 5_set num_bytes: 8759024 num_examples: 1000 - name: 6_set num_bytes: 8894284 num_examples: 1000 - name: 7_set num_bytes: 8417565 num_examples: 1000 - name: 8_set num_bytes: 9219928 num_examples: 1000 - name: 9_set num_bytes: 8654204 num_examples: 1000 - name: 10_set num_bytes: 8389535 num_examples: 1000 - name: 11_set num_bytes: 8782979 num_examples: 1000 - name: 12_set num_bytes: 8329504 num_examples: 1000 - name: 13_set num_bytes: 8262062 num_examples: 1000 - name: 14_set num_bytes: 7740216 num_examples: 1000 - name: 15_set num_bytes: 8613023 num_examples: 1000 - name: 16_set num_bytes: 8537311 num_examples: 1000 - name: 17_set num_bytes: 8736438 num_examples: 1000 - name: 18_set num_bytes: 9339629 num_examples: 1000 - name: 19_set num_bytes: 9512623 num_examples: 1000 - name: 20_set num_bytes: 9208246 num_examples: 1000 - name: 21_set num_bytes: 9601798 num_examples: 1000 - name: 22_set num_bytes: 9641493 num_examples: 1000 - name: 23_set num_bytes: 9464960 num_examples: 1000 download_size: 87231350 dataset_size: 210827107 configs: - config_name: default data_files: - split: 0_set path: data/0_set-* - split: 1_set path: data/1_set-* - split: 2_set path: data/2_set-* - split: 3_set path: data/3_set-* - split: 4_set path: data/4_set-* - split: 5_set path: data/5_set-* - split: 6_set path: data/6_set-* - split: 7_set path: data/7_set-* - split: 8_set path: data/8_set-* - split: 9_set path: data/9_set-* - split: 10_set path: data/10_set-* - split: 11_set path: data/11_set-* - split: 12_set path: data/12_set-* - split: 13_set path: data/13_set-* - split: 14_set path: data/14_set-* - split: 15_set path: data/15_set-* - split: 16_set path: data/16_set-* - split: 17_set path: data/17_set-* - split: 18_set path: data/18_set-* - split: 19_set path: data/19_set-* - split: 20_set path: data/20_set-* - split: 21_set path: data/21_set-* - split: 22_set path: data/22_set-* - split: 23_set path: data/23_set-* ---
提供机构:
thangvip
原始信息汇总

数据集概述

数据集特征

  • prompt: 数据类型为字符串
  • text_token_length: 数据类型为整数
  • text: 数据类型为字符串
  • seed_data: 数据类型为字符串
  • format: 数据类型为字符串
  • audience: 数据类型为字符串
  • vi_text: 数据类型为字符串

数据集分割

  • 0_set23_set: 每个分割包含1000个示例,具体大小如下:
    • 0_set: 7768262字节
    • 1_set: 8315002字节
    • 2_set: 8679110字节
    • 3_set: 8989839字节
    • 4_set: 8970072字节
    • 5_set: 8759024字节
    • 6_set: 8894284字节
    • 7_set: 8417565字节
    • 8_set: 9219928字节
    • 9_set: 8654204字节
    • 10_set: 8389535字节
    • 11_set: 8782979字节
    • 12_set: 8329504字节
    • 13_set: 8262062字节
    • 14_set: 7740216字节
    • 15_set: 8613023字节
    • 16_set: 8537311字节
    • 17_set: 8736438字节
    • 18_set: 9339629字节
    • 19_set: 9512623字节
    • 20_set: 9208246字节
    • 21_set: 9601798字节
    • 22_set: 9641493字节
    • 23_set: 9464960字节

数据集大小

  • 下载大小: 87231350字节
  • 数据集总大小: 210827107字节

配置

  • 默认配置: 包含24个分割的数据文件路径配置
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作