ShakaRover/ChineseDatasets
收藏Hugging Face2024-05-09 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/ShakaRover/ChineseDatasets
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
dataset_info:
- config_name: brainteasers
features:
- name: instruction
dtype: string
- name: output
dtype: string
- name: input
dtype: string
splits:
- name: train
num_bytes: 422714
num_examples: 3647
download_size: 238602
dataset_size: 422714
- config_name: chinese_traditional
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: task_type
struct:
- name: major
sequence: string
- name: minor
sequence: string
- name: domain
sequence: string
- name: metadata
dtype: string
- name: answer_from
dtype: string
- name: human_verified
dtype: bool
- name: copyright
dtype: string
splits:
- name: train
num_bytes: 765878
num_examples: 1111
download_size: 477547
dataset_size: 765878
- config_name: douban
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: task_type
struct:
- name: major
sequence: string
- name: minor
sequence: string
- name: domain
sequence: string
- name: metadata
dtype: string
- name: answer_from
dtype: string
- name: human_verified
dtype: bool
- name: copyright
dtype: string
splits:
- name: train
num_bytes: 5405784
num_examples: 3086
download_size: 3399539
dataset_size: 5405784
- config_name: ruozhiba
features:
- name: instruction
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 527470
num_examples: 1496
download_size: 358672
dataset_size: 527470
- config_name: wiki
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: task_type
struct:
- name: major
sequence: string
- name: minor
sequence: string
- name: domain
sequence: string
- name: metadata
dtype: string
- name: answer_from
dtype: string
- name: human_verified
dtype: bool
- name: copyright
dtype: string
splits:
- name: train
num_bytes: 27846693
num_examples: 10603
download_size: 10409376
dataset_size: 27846693
- config_name: xiaoyuer
features:
- name: input
dtype: string
- name: output
dtype: string
- name: instruction
dtype: string
splits:
- name: train
num_bytes: 9891
num_examples: 133
download_size: 6241
dataset_size: 9891
configs:
- config_name: brainteasers
data_files:
- split: train
path: brainteasers/train-*
- config_name: chinese_traditional
data_files:
- split: train
path: chinese_traditional/train-*
- config_name: douban
data_files:
- split: train
path: douban/train-*
- config_name: ruozhiba
data_files:
- split: train
path: ruozhiba/train-*
- config_name: wiki
data_files:
- split: train
path: wiki/train-*
- config_name: xiaoyuer
data_files:
- split: train
path: xiaoyuer/train-*
---
提供机构:
ShakaRover
原始信息汇总
数据集概述
1. Brainteasers
- 特征:
- instruction: 字符串
- output: 字符串
- input: 字符串
- 分割:
- train: 3647个样本,总大小422714字节
- 下载大小: 238602字节
- 数据集大小: 422714字节
2. Chinese Traditional
- 特征:
- instruction: 字符串
- input: 字符串
- output: 字符串
- task_type: 结构化,包含major和minor,均为字符串序列
- domain: 字符串序列
- metadata: 字符串
- answer_from: 字符串
- human_verified: 布尔值
- copyright: 字符串
- 分割:
- train: 1111个样本,总大小765878字节
- 下载大小: 477547字节
- 数据集大小: 765878字节
3. Douban
- 特征:
- instruction: 字符串
- input: 字符串
- output: 字符串
- task_type: 结构化,包含major和minor,均为字符串序列
- domain: 字符串序列
- metadata: 字符串
- answer_from: 字符串
- human_verified: 布尔值
- copyright: 字符串
- 分割:
- train: 3086个样本,总大小5405784字节
- 下载大小: 3399539字节
- 数据集大小: 5405784字节
4. Ruozhiba
- 特征:
- instruction: 字符串
- output: 字符串
- 分割:
- train: 1496个样本,总大小527470字节
- 下载大小: 358672字节
- 数据集大小: 527470字节
5. Wiki
- 特征:
- instruction: 字符串
- input: 字符串
- output: 字符串
- task_type: 结构化,包含major和minor,均为字符串序列
- domain: 字符串序列
- metadata: 字符串
- answer_from: 字符串
- human_verified: 布尔值
- copyright: 字符串
- 分割:
- train: 10603个样本,总大小27846693字节
- 下载大小: 10409376字节
- 数据集大小: 27846693字节
6. Xiaoyuer
- 特征:
- input: 字符串
- output: 字符串
- instruction: 字符串
- 分割:
- train: 133个样本,总大小9891字节
- 下载大小: 6241字节
- 数据集大小: 9891字节



