five

wisenut-nlp-team/llama_ch

收藏
Hugging Face2024-05-07 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/wisenut-nlp-team/llama_ch
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: chat features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 1422208016 num_examples: 6820506 download_size: 670955718 dataset_size: 1422208016 - config_name: closed_qa features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 17174089 num_examples: 10142 download_size: 4145020 dataset_size: 17174089 - config_name: smr features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 1138324417 num_examples: 2400591 download_size: 694024048 dataset_size: 1138324417 configs: - config_name: chat data_files: - split: train path: chat/train-* - config_name: closed_qa data_files: - split: train path: closed_qa/train-* - config_name: smr data_files: - split: train path: smr/train-* ---

The dataset includes three configurations: chat, closed_qa, and smr. Each configuration has the same feature structure, including instruction, input, and output, all of which are string types. Each configuration has a training split with different numbers of bytes and examples. The chat configuration has the largest training set, containing 6,820,506 examples and 1,422,208,016 bytes. The closed_qa configuration has the smallest training set, containing 10,142 examples and 17,174,089 bytes. The smr configurations training set is in between, containing 2,400,591 examples and 1,138,324,417 bytes.
提供机构:
wisenut-nlp-team
原始信息汇总

数据集概述

配置名称:chat

  • 特征信息:
    • 指令(instruction):字符串类型
    • 输入(input):字符串类型
    • 输出(output):字符串类型
  • 数据分割:
    • 训练集(train):
      • 数据量:6820506个样本
      • 存储大小:1422208016字节
      • 下载大小:670955718字节

配置名称:closed_qa

  • 特征信息:
    • 指令(instruction):字符串类型
    • 输入(input):字符串类型
    • 输出(output):字符串类型
  • 数据分割:
    • 训练集(train):
      • 数据量:10142个样本
      • 存储大小:17174089字节
      • 下载大小:4145020字节

配置名称:smr

  • 特征信息:
    • 指令(instruction):字符串类型
    • 输入(input):字符串类型
    • 输出(output):字符串类型
  • 数据分割:
    • 训练集(train):
      • 数据量:2400591个样本
      • 存储大小:1138324417字节
      • 下载大小:694024048字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作