wisenut-nlp-team/llama_ch
收藏Hugging Face2024-05-07 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/wisenut-nlp-team/llama_ch
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: chat
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 1422208016
num_examples: 6820506
download_size: 670955718
dataset_size: 1422208016
- config_name: closed_qa
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 17174089
num_examples: 10142
download_size: 4145020
dataset_size: 17174089
- config_name: smr
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 1138324417
num_examples: 2400591
download_size: 694024048
dataset_size: 1138324417
configs:
- config_name: chat
data_files:
- split: train
path: chat/train-*
- config_name: closed_qa
data_files:
- split: train
path: closed_qa/train-*
- config_name: smr
data_files:
- split: train
path: smr/train-*
---
The dataset includes three configurations: chat, closed_qa, and smr. Each configuration has the same feature structure, including instruction, input, and output, all of which are string types. Each configuration has a training split with different numbers of bytes and examples. The chat configuration has the largest training set, containing 6,820,506 examples and 1,422,208,016 bytes. The closed_qa configuration has the smallest training set, containing 10,142 examples and 17,174,089 bytes. The smr configurations training set is in between, containing 2,400,591 examples and 1,138,324,417 bytes.
提供机构:
wisenut-nlp-team
原始信息汇总
数据集概述
配置名称:chat
- 特征信息:
- 指令(instruction):字符串类型
- 输入(input):字符串类型
- 输出(output):字符串类型
- 数据分割:
- 训练集(train):
- 数据量:6820506个样本
- 存储大小:1422208016字节
- 下载大小:670955718字节
- 训练集(train):
配置名称:closed_qa
- 特征信息:
- 指令(instruction):字符串类型
- 输入(input):字符串类型
- 输出(output):字符串类型
- 数据分割:
- 训练集(train):
- 数据量:10142个样本
- 存储大小:17174089字节
- 下载大小:4145020字节
- 训练集(train):
配置名称:smr
- 特征信息:
- 指令(instruction):字符串类型
- 输入(input):字符串类型
- 输出(output):字符串类型
- 数据分割:
- 训练集(train):
- 数据量:2400591个样本
- 存储大小:1138324417字节
- 下载大小:694024048字节
- 训练集(train):



