five

tomg-group-umd/GenQA_raw

收藏
Hugging Face2024-06-13 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/tomg-group-umd/GenQA_raw
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: academic features: - name: user dtype: string - name: assistant dtype: string - name: prompt dtype: string - name: template dtype: string - name: idx dtype: int64 splits: - name: train num_bytes: 8614955916 num_examples: 4210076 download_size: 4070747258 dataset_size: 8614955916 - config_name: code features: - name: user dtype: string - name: assistant dtype: string - name: prompt dtype: string - name: template dtype: string - name: category dtype: string - name: idx dtype: int64 splits: - name: train num_bytes: 855686195 num_examples: 513483 download_size: 370326167 dataset_size: 855686195 - config_name: dialog features: - name: user dtype: string - name: assistant dtype: string - name: user2 dtype: string - name: assistant2 dtype: string - name: user3 dtype: string - name: assistant3 dtype: string - name: user4 dtype: string - name: assistant4 dtype: string - name: prompt dtype: string - name: idx dtype: int64 splits: - name: train num_bytes: 2613783708 num_examples: 819154 download_size: 1226407538 dataset_size: 2613783708 - config_name: general features: - name: user dtype: string - name: assistant dtype: string - name: template dtype: string - name: idx dtype: int64 splits: - name: train num_bytes: 377010471 num_examples: 304920 download_size: 211886096 dataset_size: 377010471 - config_name: math features: - name: user dtype: string - name: assistant dtype: string - name: user2 dtype: string - name: assistant2 dtype: string - name: prompt dtype: string - name: idx dtype: int64 splits: - name: train num_bytes: 912151884 num_examples: 515509 download_size: 271708327 dataset_size: 912151884 - config_name: mmlu features: - name: user dtype: string - name: assistant dtype: string - name: prompt dtype: string - name: template dtype: string - name: idx dtype: int64 splits: - name: train num_bytes: 4523835106 num_examples: 2409841 download_size: 2104540276 dataset_size: 4523835106 - config_name: multiple_choice features: - name: user dtype: string - name: assistant dtype: string - name: prompt dtype: string - name: idx dtype: int64 splits: - name: train num_bytes: 555013194 num_examples: 372610 download_size: 215020093 dataset_size: 555013194 - config_name: task features: - name: user dtype: string - name: assistant dtype: string - name: user2 dtype: string - name: assistant2 dtype: string - name: prompt dtype: string - name: idx dtype: int64 splits: - name: train num_bytes: 2160397568 num_examples: 1004179 download_size: 881027426 dataset_size: 2160397568 - config_name: writing features: - name: user dtype: string - name: assistant dtype: string - name: user2 dtype: string - name: assistant2 dtype: string - name: prompt dtype: string - name: template dtype: string - name: idx dtype: int64 splits: - name: train num_bytes: 2947982996 num_examples: 932362 download_size: 1346605382 dataset_size: 2947982996 configs: - config_name: academic data_files: - split: train path: academic/train-* - config_name: code data_files: - split: train path: code/train-* - config_name: dialog data_files: - split: train path: dialog/train-* - config_name: general data_files: - split: train path: general/train-* - config_name: math data_files: - split: train path: math/train-* - config_name: mmlu data_files: - split: train path: mmlu/train-* - config_name: multiple_choice data_files: - split: train path: multiple_choice/train-* - config_name: task data_files: - split: train path: task/train-* - config_name: writing data_files: - split: train path: writing/train-* ---
提供机构:
tomg-group-umd
原始信息汇总

数据集概述

数据集配置

学术 (academic)

  • 特征:
    • user: string
    • assistant: string
    • prompt: string
    • template: string
    • idx: int64
  • 分割:
    • train:
      • 字节数: 8614955916
      • 样本数: 4210076
  • 下载大小: 4070747258
  • 数据集大小: 8614955916
  • 数据文件路径: academic/train-*

代码 (code)

  • 特征:
    • user: string
    • assistant: string
    • prompt: string
    • template: string
    • category: string
    • idx: int64
  • 分割:
    • train:
      • 字节数: 855686195
      • 样本数: 513483
  • 下载大小: 370326167
  • 数据集大小: 855686195
  • 数据文件路径: code/train-*

对话 (dialog)

  • 特征:
    • user: string
    • assistant: string
    • user2: string
    • assistant2: string
    • user3: string
    • assistant3: string
    • user4: string
    • assistant4: string
    • prompt: string
    • idx: int64
  • 分割:
    • train:
      • 字节数: 2613783708
      • 样本数: 819154
  • 下载大小: 1226407538
  • 数据集大小: 2613783708
  • 数据文件路径: dialog/train-*

通用 (general)

  • 特征:
    • user: string
    • assistant: string
    • template: string
    • idx: int64
  • 分割:
    • train:
      • 字节数: 377010471
      • 样本数: 304920
  • 下载大小: 211886096
  • 数据集大小: 377010471
  • 数据文件路径: general/train-*

数学 (math)

  • 特征:
    • user: string
    • assistant: string
    • user2: string
    • assistant2: string
    • prompt: string
    • idx: int64
  • 分割:
    • train:
      • 字节数: 912151884
      • 样本数: 515509
  • 下载大小: 271708327
  • 数据集大小: 912151884
  • 数据文件路径: math/train-*

MMLU (mmlu)

  • 特征:
    • user: string
    • assistant: string
    • prompt: string
    • template: string
    • idx: int64
  • 分割:
    • train:
      • 字节数: 4523835106
      • 样本数: 2409841
  • 下载大小: 2104540276
  • 数据集大小: 4523835106
  • 数据文件路径: mmlu/train-*

多选题 (multiple_choice)

  • 特征:
    • user: string
    • assistant: string
    • prompt: string
    • idx: int64
  • 分割:
    • train:
      • 字节数: 555013194
      • 样本数: 372610
  • 下载大小: 215020093
  • 数据集大小: 555013194
  • 数据文件路径: multiple_choice/train-*

任务 (task)

  • 特征:
    • user: string
    • assistant: string
    • user2: string
    • assistant2: string
    • prompt: string
    • idx: int64
  • 分割:
    • train:
      • 字节数: 2160397568
      • 样本数: 1004179
  • 下载大小: 881027426
  • 数据集大小: 2160397568
  • 数据文件路径: task/train-*

写作 (writing)

  • 特征:
    • user: string
    • assistant: string
    • user2: string
    • assistant2: string
    • prompt: string
    • template: string
    • idx: int64
  • 分割:
    • train:
      • 字节数: 2947982996
      • 样本数: 932362
  • 下载大小: 1346605382
  • 数据集大小: 2947982996
  • 数据文件路径: writing/train-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作