five

anhnv125/ud_alpaca

收藏
Hugging Face2024-03-05 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/anhnv125/ud_alpaca
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: be_hse features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 66768815 num_examples: 21555 - name: validation num_bytes: 3370351 num_examples: 1090 - name: test num_bytes: 2873580 num_examples: 889 download_size: 5480853 dataset_size: 73012746 - config_name: bxr_bdt features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 56167 num_examples: 19 - name: test num_bytes: 2821495 num_examples: 908 download_size: 228304 dataset_size: 2877662 - config_name: cs_pdt features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 216399245 num_examples: 68495 - name: validation num_bytes: 29301204 num_examples: 9270 - name: test num_bytes: 32048085 num_examples: 10148 download_size: 25707376 dataset_size: 277748534 - config_name: de_gsd features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 44307017 num_examples: 13814 - name: validation num_bytes: 2477610 num_examples: 799 - name: test num_bytes: 3070360 num_examples: 977 download_size: 4999156 dataset_size: 49854987 - config_name: en_ewt features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 38805886 num_examples: 12543 - name: validation num_bytes: 6000641 num_examples: 2002 - name: test num_bytes: 6198885 num_examples: 2077 download_size: 3810046 dataset_size: 51005412 - config_name: es_ancora features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 49943950 num_examples: 14305 - name: validation num_bytes: 5798461 num_examples: 1654 - name: test num_bytes: 5985191 num_examples: 1721 download_size: 8063762 dataset_size: 61727602 - config_name: fr_gsd features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 48157929 num_examples: 14449 - name: validation num_bytes: 4906593 num_examples: 1476 - name: test num_bytes: 1378398 num_examples: 416 download_size: 6341149 dataset_size: 54442920 - config_name: hsb_ufal features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 74433 num_examples: 23 - name: test num_bytes: 1963315 num_examples: 623 download_size: 218777 dataset_size: 2037748 - config_name: kk_ktb features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 102630 num_examples: 31 - name: test num_bytes: 3176663 num_examples: 1047 download_size: 257360 dataset_size: 3279293 - config_name: lt_hse features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 501163 num_examples: 153 - name: validation num_bytes: 501163 num_examples: 153 - name: test num_bytes: 501163 num_examples: 153 download_size: 229455 dataset_size: 1503489 - config_name: ru_syntagrus features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 163096580 num_examples: 48814 - name: validation num_bytes: 21977495 num_examples: 6584 - name: test num_bytes: 21691135 num_examples: 6491 download_size: 21623891 dataset_size: 206765210 configs: - config_name: be_hse data_files: - split: train path: be_hse/train-* - split: validation path: be_hse/validation-* - split: test path: be_hse/test-* - config_name: bxr_bdt data_files: - split: train path: bxr_bdt/train-* - split: test path: bxr_bdt/test-* - config_name: cs_pdt data_files: - split: train path: cs_pdt/train-* - split: validation path: cs_pdt/validation-* - split: test path: cs_pdt/test-* - config_name: de_gsd data_files: - split: train path: de_gsd/train-* - split: validation path: de_gsd/validation-* - split: test path: de_gsd/test-* - config_name: en_ewt data_files: - split: train path: en_ewt/train-* - split: validation path: en_ewt/validation-* - split: test path: en_ewt/test-* - config_name: es_ancora data_files: - split: train path: es_ancora/train-* - split: validation path: es_ancora/validation-* - split: test path: es_ancora/test-* - config_name: fr_gsd data_files: - split: train path: fr_gsd/train-* - split: validation path: fr_gsd/validation-* - split: test path: fr_gsd/test-* - config_name: hsb_ufal data_files: - split: train path: hsb_ufal/train-* - split: test path: hsb_ufal/test-* - config_name: kk_ktb data_files: - split: train path: kk_ktb/train-* - split: test path: kk_ktb/test-* - config_name: lt_hse data_files: - split: train path: lt_hse/train-* - split: validation path: lt_hse/validation-* - split: test path: lt_hse/test-* - config_name: ru_syntagrus data_files: - split: train path: ru_syntagrus/train-* - split: validation path: ru_syntagrus/validation-* - split: test path: ru_syntagrus/test-* ---
提供机构:
anhnv125
原始信息汇总

数据集概述

数据集配置

be_hse

  • 特征:
    • instruction: string
    • input: string
    • output: string
  • 分割:
    • train: 66768815 字节, 21555 样本
    • validation: 3370351 字节, 1090 样本
    • test: 2873580 字节, 889 样本
  • 下载大小: 5480853 字节
  • 数据集大小: 73012746 字节

bxr_bdt

  • 特征:
    • instruction: string
    • input: string
    • output: string
  • 分割:
    • train: 56167 字节, 19 样本
    • test: 2821495 字节, 908 样本
  • 下载大小: 228304 字节
  • 数据集大小: 2877662 字节

cs_pdt

  • 特征:
    • instruction: string
    • input: string
    • output: string
  • 分割:
    • train: 216399245 字节, 68495 样本
    • validation: 29301204 字节, 9270 样本
    • test: 32048085 字节, 10148 样本
  • 下载大小: 25707376 字节
  • 数据集大小: 277748534 字节

de_gsd

  • 特征:
    • instruction: string
    • input: string
    • output: string
  • 分割:
    • train: 44307017 字节, 13814 样本
    • validation: 2477610 字节, 799 样本
    • test: 3070360 字节, 977 样本
  • 下载大小: 4999156 字节
  • 数据集大小: 49854987 字节

en_ewt

  • 特征:
    • instruction: string
    • input: string
    • output: string
  • 分割:
    • train: 38805886 字节, 12543 样本
    • validation: 6000641 字节, 2002 样本
    • test: 6198885 字节, 2077 样本
  • 下载大小: 3810046 字节
  • 数据集大小: 51005412 字节

es_ancora

  • 特征:
    • instruction: string
    • input: string
    • output: string
  • 分割:
    • train: 49943950 字节, 14305 样本
    • validation: 5798461 字节, 1654 样本
    • test: 5985191 字节, 1721 样本
  • 下载大小: 8063762 字节
  • 数据集大小: 61727602 字节

fr_gsd

  • 特征:
    • instruction: string
    • input: string
    • output: string
  • 分割:
    • train: 48157929 字节, 14449 样本
    • validation: 4906593 字节, 1476 样本
    • test: 1378398 字节, 416 样本
  • 下载大小: 6341149 字节
  • 数据集大小: 54442920 字节

hsb_ufal

  • 特征:
    • instruction: string
    • input: string
    • output: string
  • 分割:
    • train: 74433 字节, 23 样本
    • test: 1963315 字节, 623 样本
  • 下载大小: 218777 字节
  • 数据集大小: 2037748 字节

kk_ktb

  • 特征:
    • instruction: string
    • input: string
    • output: string
  • 分割:
    • train: 102630 字节, 31 样本
    • test: 3176663 字节, 1047 样本
  • 下载大小: 257360 字节
  • 数据集大小: 3279293 字节

lt_hse

  • 特征:
    • instruction: string
    • input: string
    • output: string
  • 分割:
    • train: 501163 字节, 153 样本
    • validation: 501163 字节, 153 样本
    • test: 501163 字节, 153 样本
  • 下载大小: 229455 字节
  • 数据集大小: 1503489 字节

ru_syntagrus

  • 特征:
    • instruction: string
    • input: string
    • output: string
  • 分割:
    • train: 163096580 字节, 48814 样本
    • validation: 21977495 字节, 6584 样本
    • test: 21691135 字节, 6491 样本
  • 下载大小: 21623891 字节
  • 数据集大小: 206765210 字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作