HydraLM/partitioned_v2_split
收藏Hugging Face2023-07-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/HydraLM/partitioned_v2_split
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: '0'
path: data/0-*
- split: '1'
path: data/1-*
- split: '2'
path: data/2-*
- split: '3'
path: data/3-*
- split: '4'
path: data/4-*
- split: '5'
path: data/5-*
- split: '6'
path: data/6-*
- split: '7'
path: data/7-*
- split: '8'
path: data/8-*
- split: '9'
path: data/9-*
- split: '10'
path: data/10-*
- split: '11'
path: data/11-*
- split: '12'
path: data/12-*
- split: '13'
path: data/13-*
- split: '14'
path: data/14-*
- split: '15'
path: data/15-*
dataset_info:
features:
- name: conversations
list:
- name: input
dtype: string
- name: instruction
dtype: string
- name: response
dtype: string
- name: conversation_id
dtype: int64
- name: dataset_id
dtype: string
- name: cluster_text
dtype: string
- name: embedding
sequence: float64
- name: cluster
dtype: int64
- name: unique_id
dtype: string
splits:
- name: '0'
num_bytes: 779602139
num_examples: 57463
- name: '1'
num_bytes: 716142691
num_examples: 47816
- name: '2'
num_bytes: 376723531
num_examples: 43276
- name: '3'
num_bytes: 271125675
num_examples: 37872
- name: '4'
num_bytes: 334527340
num_examples: 42303
- name: '5'
num_bytes: 428843979
num_examples: 44084
- name: '6'
num_bytes: 285189781
num_examples: 39017
- name: '7'
num_bytes: 350378889
num_examples: 30775
- name: '8'
num_bytes: 261834062
num_examples: 33594
- name: '9'
num_bytes: 165750034
num_examples: 19440
- name: '10'
num_bytes: 137592285
num_examples: 11770
- name: '11'
num_bytes: 688937855
num_examples: 69955
- name: '12'
num_bytes: 239948606
num_examples: 22717
- name: '13'
num_bytes: 377427901
num_examples: 50626
- name: '14'
num_bytes: 343568172
num_examples: 41822
- name: '15'
num_bytes: 711665879
num_examples: 79575
download_size: 4399745966
dataset_size: 6469258819
---
# Dataset Card for "partitioned_v2_split"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
HydraLM
原始信息汇总
数据集概述
配置信息
- config_name: default
- data_files:
- 包含多个文件,每个文件对应一个split,如
data/0-*至data/15-*。
- 包含多个文件,每个文件对应一个split,如
数据集信息
- features:
- conversations:
- input: 数据类型为string
- instruction: 数据类型为string
- response: 数据类型为string
- conversation_id: 数据类型为int64
- dataset_id: 数据类型为string
- cluster_text: 数据类型为string
- embedding: 数据类型为float64,序列类型
- cluster: 数据类型为int64
- unique_id: 数据类型为string
- conversations:
数据集分割
-
splits:
- 0: 大小779602139字节,包含57463个样本
- 1: 大小716142691字节,包含47816个样本
- 2: 大小376723531字节,包含43276个样本
- 3: 大小271125675字节,包含37872个样本
- 4: 大小334527340字节,包含42303个样本
- 5: 大小428843979字节,包含44084个样本
- 6: 大小285189781字节,包含39017个样本
- 7: 大小350378889字节,包含30775个样本
- 8: 大小261834062字节,包含33594个样本
- 9: 大小165750034字节,包含19440个样本
- 10: 大小137592285字节,包含11770个样本
- 11: 大小688937855字节,包含69955个样本
- 12: 大小239948606字节,包含22717个样本
- 13: 大小377427901字节,包含50626个样本
- 14: 大小343568172字节,包含41822个样本
- 15: 大小711665879字节,包含79575个样本
-
download_size: 4399745966字节
-
dataset_size: 6469258819字节



