HydraLM/partitioned_v2_split

Name: HydraLM/partitioned_v2_split
Creator: HydraLM
Published: 2023-07-30 08:04:56
License: 暂无描述

Hugging Face2023-07-30 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/HydraLM/partitioned_v2_split

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: default data_files: - split: '0' path: data/0-* - split: '1' path: data/1-* - split: '2' path: data/2-* - split: '3' path: data/3-* - split: '4' path: data/4-* - split: '5' path: data/5-* - split: '6' path: data/6-* - split: '7' path: data/7-* - split: '8' path: data/8-* - split: '9' path: data/9-* - split: '10' path: data/10-* - split: '11' path: data/11-* - split: '12' path: data/12-* - split: '13' path: data/13-* - split: '14' path: data/14-* - split: '15' path: data/15-* dataset_info: features: - name: conversations list: - name: input dtype: string - name: instruction dtype: string - name: response dtype: string - name: conversation_id dtype: int64 - name: dataset_id dtype: string - name: cluster_text dtype: string - name: embedding sequence: float64 - name: cluster dtype: int64 - name: unique_id dtype: string splits: - name: '0' num_bytes: 779602139 num_examples: 57463 - name: '1' num_bytes: 716142691 num_examples: 47816 - name: '2' num_bytes: 376723531 num_examples: 43276 - name: '3' num_bytes: 271125675 num_examples: 37872 - name: '4' num_bytes: 334527340 num_examples: 42303 - name: '5' num_bytes: 428843979 num_examples: 44084 - name: '6' num_bytes: 285189781 num_examples: 39017 - name: '7' num_bytes: 350378889 num_examples: 30775 - name: '8' num_bytes: 261834062 num_examples: 33594 - name: '9' num_bytes: 165750034 num_examples: 19440 - name: '10' num_bytes: 137592285 num_examples: 11770 - name: '11' num_bytes: 688937855 num_examples: 69955 - name: '12' num_bytes: 239948606 num_examples: 22717 - name: '13' num_bytes: 377427901 num_examples: 50626 - name: '14' num_bytes: 343568172 num_examples: 41822 - name: '15' num_bytes: 711665879 num_examples: 79575 download_size: 4399745966 dataset_size: 6469258819 --- # Dataset Card for "partitioned_v2_split" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

HydraLM

原始信息汇总

数据集概述

配置信息

config_name: default
data_files:
- 包含多个文件，每个文件对应一个split，如data/0-*至data/15-*。

数据集信息

features:
- conversations:
  - input: 数据类型为string
  - instruction: 数据类型为string
  - response: 数据类型为string
- conversation_id: 数据类型为int64
- dataset_id: 数据类型为string
- cluster_text: 数据类型为string
- embedding: 数据类型为float64，序列类型
- cluster: 数据类型为int64
- unique_id: 数据类型为string

数据集分割

splits:
- 0: 大小779602139字节，包含57463个样本
- 1: 大小716142691字节，包含47816个样本
- 2: 大小376723531字节，包含43276个样本
- 3: 大小271125675字节，包含37872个样本
- 4: 大小334527340字节，包含42303个样本
- 5: 大小428843979字节，包含44084个样本
- 6: 大小285189781字节，包含39017个样本
- 7: 大小350378889字节，包含30775个样本
- 8: 大小261834062字节，包含33594个样本
- 9: 大小165750034字节，包含19440个样本
- 10: 大小137592285字节，包含11770个样本
- 11: 大小688937855字节，包含69955个样本
- 12: 大小239948606字节，包含22717个样本
- 13: 大小377427901字节，包含50626个样本
- 14: 大小343568172字节，包含41822个样本
- 15: 大小711665879字节，包含79575个样本
download_size: 4399745966字节
dataset_size: 6469258819字节

5,000+

优质数据集

54 个

任务类型

进入经典数据集