pharaouk/SkunkData-Corpus-Clusters-001
收藏Hugging Face2023-09-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/pharaouk/SkunkData-Corpus-Clusters-001
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: orca_0
path: data/orca_0-*
- split: instruct_0
path: data/instruct_0-*
- split: orca_1
path: data/orca_1-*
- split: instruct_1
path: data/instruct_1-*
- split: orca_2
path: data/orca_2-*
- split: instruct_2
path: data/instruct_2-*
- split: orca_3
path: data/orca_3-*
- split: instruct_3
path: data/instruct_3-*
- split: orca_4
path: data/orca_4-*
- split: instruct_4
path: data/instruct_4-*
- split: orca_5
path: data/orca_5-*
- split: instruct_5
path: data/instruct_5-*
- split: orca_6
path: data/orca_6-*
- split: instruct_6
path: data/instruct_6-*
- split: orca_7
path: data/orca_7-*
- split: instruct_7
path: data/instruct_7-*
- split: orca_8
path: data/orca_8-*
- split: instruct_8
path: data/instruct_8-*
- split: orca_9
path: data/orca_9-*
- split: instruct_9
path: data/instruct_9-*
- split: orca_10
path: data/orca_10-*
- split: instruct_10
path: data/instruct_10-*
- split: orca_11
path: data/orca_11-*
- split: instruct_11
path: data/instruct_11-*
- split: orca_12
path: data/orca_12-*
- split: instruct_12
path: data/instruct_12-*
- split: orca_13
path: data/orca_13-*
- split: instruct_13
path: data/instruct_13-*
- split: orca_14
path: data/orca_14-*
- split: instruct_14
path: data/instruct_14-*
- split: orca_15
path: data/orca_15-*
- split: instruct_15
path: data/instruct_15-*
- split: orca_16
path: data/orca_16-*
- split: instruct_16
path: data/instruct_16-*
- split: orca_17
path: data/orca_17-*
- split: instruct_17
path: data/instruct_17-*
- split: orca_18
path: data/orca_18-*
- split: instruct_18
path: data/instruct_18-*
- split: orca_19
path: data/orca_19-*
- split: instruct_19
path: data/instruct_19-*
- split: orca_20
path: data/orca_20-*
- split: instruct_20
path: data/instruct_20-*
- split: orca_21
path: data/orca_21-*
- split: instruct_21
path: data/instruct_21-*
- split: orca_22
path: data/orca_22-*
- split: instruct_22
path: data/instruct_22-*
- split: orca_23
path: data/orca_23-*
- split: instruct_23
path: data/instruct_23-*
- split: orca_24
path: data/orca_24-*
- split: instruct_24
path: data/instruct_24-*
- split: orca_25
path: data/orca_25-*
- split: instruct_25
path: data/instruct_25-*
- split: orca_26
path: data/orca_26-*
- split: instruct_26
path: data/instruct_26-*
- split: orca_27
path: data/orca_27-*
- split: instruct_27
path: data/instruct_27-*
- split: orca_28
path: data/orca_28-*
- split: instruct_28
path: data/instruct_28-*
- split: orca_29
path: data/orca_29-*
- split: instruct_29
path: data/instruct_29-*
- split: orca_30
path: data/orca_30-*
- split: instruct_30
path: data/instruct_30-*
- split: orca_31
path: data/orca_31-*
- split: instruct_31
path: data/instruct_31-*
dataset_info:
features:
- name: message
dtype: string
- name: message_type
dtype: string
- name: message_id
dtype: int64
- name: conversation_id
dtype: int64
- name: dataset_id
dtype: string
- name: unique_conversation_id
dtype: string
- name: cluster
dtype: float64
- name: __index_level_0__
dtype: int64
splits:
- name: orca_0
num_bytes: 17849715
num_examples: 18401
- name: instruct_0
num_bytes: 70074569
num_examples: 81024
- name: orca_1
num_bytes: 23680133
num_examples: 28584
- name: instruct_1
num_bytes: 82931087
num_examples: 96749
- name: orca_2
num_bytes: 19980410
num_examples: 17412
- name: instruct_2
num_bytes: 154000003
num_examples: 124814
- name: orca_3
num_bytes: 17101778
num_examples: 32038
- name: instruct_3
num_bytes: 49883928
num_examples: 63327
- name: orca_4
num_bytes: 31656753
num_examples: 34675
- name: instruct_4
num_bytes: 127695479
num_examples: 126005
- name: orca_5
num_bytes: 16269511
num_examples: 14092
- name: instruct_5
num_bytes: 61398228
num_examples: 59076
- name: orca_6
num_bytes: 1342860
num_examples: 2388
- name: instruct_6
num_bytes: 48450814
num_examples: 66011
- name: orca_7
num_bytes: 44849080
num_examples: 36172
- name: instruct_7
num_bytes: 65892068
num_examples: 59876
- name: orca_8
num_bytes: 19352268
num_examples: 18871
- name: instruct_8
num_bytes: 227627947
num_examples: 170841
- name: orca_9
num_bytes: 14700372
num_examples: 15315
- name: instruct_9
num_bytes: 64004683
num_examples: 60637
- name: orca_10
num_bytes: 508915
num_examples: 1446
- name: instruct_10
num_bytes: 24081225
num_examples: 48031
- name: orca_11
num_bytes: 19443068
num_examples: 19745
- name: instruct_11
num_bytes: 82438320
num_examples: 80868
- name: orca_12
num_bytes: 4848059
num_examples: 7172
- name: instruct_12
num_bytes: 166293672
num_examples: 182113
- name: orca_13
num_bytes: 10599648
num_examples: 19167
- name: instruct_13
num_bytes: 84060226
num_examples: 152834
- name: orca_14
num_bytes: 15987021
num_examples: 24048
- name: instruct_14
num_bytes: 59454799
num_examples: 91972
- name: orca_15
num_bytes: 23903599
num_examples: 24410
- name: instruct_15
num_bytes: 85555445
num_examples: 84953
- name: orca_16
num_bytes: 23154299
num_examples: 19289
- name: instruct_16
num_bytes: 101140401
num_examples: 90731
- name: orca_17
num_bytes: 2152082
num_examples: 3809
- name: instruct_17
num_bytes: 66472234
num_examples: 80386
- name: orca_18
num_bytes: 83273007
num_examples: 45544
- name: instruct_18
num_bytes: 110961860
num_examples: 80604
- name: orca_19
num_bytes: 1386401
num_examples: 1644
- name: instruct_19
num_bytes: 37424277
num_examples: 42630
- name: orca_20
num_bytes: 15212013
num_examples: 14602
- name: instruct_20
num_bytes: 94216681
num_examples: 77830
- name: orca_21
num_bytes: 3440922
num_examples: 4174
- name: instruct_21
num_bytes: 124095838
num_examples: 87012
- name: orca_22
num_bytes: 11468080
num_examples: 14191
- name: instruct_22
num_bytes: 63633991
num_examples: 78980
- name: orca_23
num_bytes: 3591049
num_examples: 3778
- name: instruct_23
num_bytes: 95699355
num_examples: 69680
- name: orca_24
num_bytes: 1309953
num_examples: 2395
- name: instruct_24
num_bytes: 82548064
num_examples: 92642
- name: orca_25
num_bytes: 20598114
num_examples: 18715
- name: instruct_25
num_bytes: 132539502
num_examples: 99843
- name: orca_26
num_bytes: 31638864
num_examples: 65463
- name: instruct_26
num_bytes: 52624322
num_examples: 81968
- name: orca_27
num_bytes: 3056079
num_examples: 5939
- name: instruct_27
num_bytes: 29071432
num_examples: 55864
- name: orca_28
num_bytes: 12158143
num_examples: 16039
- name: instruct_28
num_bytes: 67326019
num_examples: 84243
- name: orca_29
num_bytes: 33228880
num_examples: 65846
- name: instruct_29
num_bytes: 16788126
num_examples: 21536
- name: orca_30
num_bytes: 1580412
num_examples: 1991
- name: instruct_30
num_bytes: 15819978
num_examples: 29766
- name: orca_31
num_bytes: 6719191
num_examples: 11269
- name: instruct_31
num_bytes: 29009522
num_examples: 47163
download_size: 1412051638
dataset_size: 3109254774
---
# Dataset Card for "SkunkData-Corpus-Clusters-001"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
pharaouk
原始信息汇总
数据集概述
数据集配置
- 配置名称: default
- 数据文件路径:
orca_0: data/orca_0-*instruct_0: data/instruct_0-*orca_1: data/orca_1-*instruct_1: data/instruct_1-*orca_2: data/orca_2-*instruct_2: data/instruct_2-*orca_3: data/orca_3-*instruct_3: data/instruct_3-*orca_4: data/orca_4-*instruct_4: data/instruct_4-*orca_5: data/orca_5-*instruct_5: data/instruct_5-*orca_6: data/orca_6-*instruct_6: data/instruct_6-*orca_7: data/orca_7-*instruct_7: data/instruct_7-*orca_8: data/orca_8-*instruct_8: data/instruct_8-*orca_9: data/orca_9-*instruct_9: data/instruct_9-*orca_10: data/orca_10-*instruct_10: data/instruct_10-*orca_11: data/orca_11-*instruct_11: data/instruct_11-*orca_12: data/orca_12-*instruct_12: data/instruct_12-*orca_13: data/orca_13-*instruct_13: data/instruct_13-*orca_14: data/orca_14-*instruct_14: data/instruct_14-*orca_15: data/orca_15-*instruct_15: data/instruct_15-*orca_16: data/orca_16-*instruct_16: data/instruct_16-*orca_17: data/orca_17-*instruct_17: data/instruct_17-*orca_18: data/orca_18-*instruct_18: data/instruct_18-*orca_19: data/orca_19-*instruct_19: data/instruct_19-*orca_20: data/orca_20-*instruct_20: data/instruct_20-*orca_21: data/orca_21-*instruct_21: data/instruct_21-*orca_22: data/orca_22-*instruct_22: data/instruct_22-*orca_23: data/orca_23-*instruct_23: data/instruct_23-*orca_24: data/orca_24-*instruct_24: data/instruct_24-*orca_25: data/orca_25-*instruct_25: data/instruct_25-*orca_26: data/orca_26-*instruct_26: data/instruct_26-*orca_27: data/orca_27-*instruct_27: data/instruct_27-*orca_28: data/orca_28-*instruct_28: data/instruct_28-*orca_29: data/orca_29-*instruct_29: data/instruct_29-*orca_30: data/orca_30-*instruct_30: data/instruct_30-*orca_31: data/orca_31-*instruct_31: data/instruct_31-*
数据集信息
-
特征:
message: 字符串message_type: 字符串message_id: 64位整数conversation_id: 64位整数dataset_id: 字符串unique_conversation_id: 字符串cluster: 64位浮点数__index_level_0__: 64位整数
-
分割:
orca_0: 17849715 字节, 18401 个样本instruct_0: 70074569 字节, 81024 个样本orca_1: 23680133 字节, 28584 个样本instruct_1: 82931087 字节, 96749 个样本orca_2: 19980410 字节, 17412 个样本instruct_2: 154000003 字节, 124814 个样本orca_3: 17101778 字节, 32038 个样本instruct_3: 49883928 字节, 63327 个样本orca_4: 31656753 字节, 34675 个样本instruct_4: 127695479 字节, 126005 个样本orca_5: 16269511 字节, 14092 个样本instruct_5: 61398228 字节, 59076 个样本orca_6: 1342860 字节, 2388 个样本instruct_6: 48450814 字节, 66011 个样本orca_7: 44849080 字节, 36172 个样本instruct_7: 65892068 字节, 59876 个样本orca_8: 19352268 字节, 18871 个样本instruct_8: 227627947 字节, 170841 个样本orca_9: 14700372 字节, 15315 个样本instruct_9: 64004683 字节, 60637 个样本orca_10: 508915 字节, 1446 个样本instruct_10: 24081225 字节, 48031 个样本orca_11: 19443068 字节, 19745 个样本instruct_11: 82438320 字节, 80868 个样本orca_12: 4848059 字节, 7172 个样本instruct_12: 166293672 字节, 182113 个样本orca_13: 10599648 字节, 19167 个样本instruct_13: 84060226 字节, 152834 个样本orca_14: 15987021 字节, 24048 个样本instruct_14: 59454799 字节, 91972 个样本orca_15: 23903599 字节, 24410 个样本instruct_15: 85555445 字节, 84953 个样本orca_16: 23154299 字节, 19289 个样本instruct_16: 101140401 字节, 90731 个样本orca_17: 2152082 字节, 3809 个样本instruct_17: 66472234 字节, 80386 个样本orca_18: 83273007 字节, 45544 个样本instruct_18: 110961860 字节, 80604 个样本orca_19: 1386401 字节, 1644 个样本instruct_19: 37424277 字节, 42630 个样本orca_20: 15212013 字节, 14602 个样本instruct_20: 94216681 字节, 77830 个样本orca_21: 3440922 字节, 4174 个样本instruct_21: 124095838 字节, 87012 个样本orca_22: 11468080 字节, 14191 个样本instruct_22: 63633991 字节, 78980 个样本orca_23: 3591049 字节, 3778 个样本instruct_23: 95699355 字节, 69680 个样本orca_24: 1309953 字节, 2395 个样本instruct_24: 82548064 字节, 92642 个样本orca_25: 20598114 字节, 18715 个样本instruct_25: 132539502 字节, 99843 个样本orca_26: 31638864 字节, 65463 个样本instruct_26: 52624322 字节, 81968 个样本orca_27: 3056079 字节, 5939 个样本instruct_27: 29071432 字节, 55864 个样本orca_28: 12158143 字节, 16039 个样本instruct_28: 67326019 字节, 84243 个样本orca_29: 33228880 字节, 65846 个样本instruct_29: 16788126 字节, 21536 个样本orca_30: 1580412 字节, 1991 个样本instruct_30: 15819978 字节, 29766 个样本orca_31: 6719191 字节, 11269 个样本instruct_31: 29009522 字节, 47163 个样本
-
数据集大小:
- 下载大小: 1412051638 字节
- 数据集大小: 3109254774 字节



