five

pharaouk/SkunkData-Corpus-Clusters-001

收藏
Hugging Face2023-09-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/pharaouk/SkunkData-Corpus-Clusters-001
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_files: - split: orca_0 path: data/orca_0-* - split: instruct_0 path: data/instruct_0-* - split: orca_1 path: data/orca_1-* - split: instruct_1 path: data/instruct_1-* - split: orca_2 path: data/orca_2-* - split: instruct_2 path: data/instruct_2-* - split: orca_3 path: data/orca_3-* - split: instruct_3 path: data/instruct_3-* - split: orca_4 path: data/orca_4-* - split: instruct_4 path: data/instruct_4-* - split: orca_5 path: data/orca_5-* - split: instruct_5 path: data/instruct_5-* - split: orca_6 path: data/orca_6-* - split: instruct_6 path: data/instruct_6-* - split: orca_7 path: data/orca_7-* - split: instruct_7 path: data/instruct_7-* - split: orca_8 path: data/orca_8-* - split: instruct_8 path: data/instruct_8-* - split: orca_9 path: data/orca_9-* - split: instruct_9 path: data/instruct_9-* - split: orca_10 path: data/orca_10-* - split: instruct_10 path: data/instruct_10-* - split: orca_11 path: data/orca_11-* - split: instruct_11 path: data/instruct_11-* - split: orca_12 path: data/orca_12-* - split: instruct_12 path: data/instruct_12-* - split: orca_13 path: data/orca_13-* - split: instruct_13 path: data/instruct_13-* - split: orca_14 path: data/orca_14-* - split: instruct_14 path: data/instruct_14-* - split: orca_15 path: data/orca_15-* - split: instruct_15 path: data/instruct_15-* - split: orca_16 path: data/orca_16-* - split: instruct_16 path: data/instruct_16-* - split: orca_17 path: data/orca_17-* - split: instruct_17 path: data/instruct_17-* - split: orca_18 path: data/orca_18-* - split: instruct_18 path: data/instruct_18-* - split: orca_19 path: data/orca_19-* - split: instruct_19 path: data/instruct_19-* - split: orca_20 path: data/orca_20-* - split: instruct_20 path: data/instruct_20-* - split: orca_21 path: data/orca_21-* - split: instruct_21 path: data/instruct_21-* - split: orca_22 path: data/orca_22-* - split: instruct_22 path: data/instruct_22-* - split: orca_23 path: data/orca_23-* - split: instruct_23 path: data/instruct_23-* - split: orca_24 path: data/orca_24-* - split: instruct_24 path: data/instruct_24-* - split: orca_25 path: data/orca_25-* - split: instruct_25 path: data/instruct_25-* - split: orca_26 path: data/orca_26-* - split: instruct_26 path: data/instruct_26-* - split: orca_27 path: data/orca_27-* - split: instruct_27 path: data/instruct_27-* - split: orca_28 path: data/orca_28-* - split: instruct_28 path: data/instruct_28-* - split: orca_29 path: data/orca_29-* - split: instruct_29 path: data/instruct_29-* - split: orca_30 path: data/orca_30-* - split: instruct_30 path: data/instruct_30-* - split: orca_31 path: data/orca_31-* - split: instruct_31 path: data/instruct_31-* dataset_info: features: - name: message dtype: string - name: message_type dtype: string - name: message_id dtype: int64 - name: conversation_id dtype: int64 - name: dataset_id dtype: string - name: unique_conversation_id dtype: string - name: cluster dtype: float64 - name: __index_level_0__ dtype: int64 splits: - name: orca_0 num_bytes: 17849715 num_examples: 18401 - name: instruct_0 num_bytes: 70074569 num_examples: 81024 - name: orca_1 num_bytes: 23680133 num_examples: 28584 - name: instruct_1 num_bytes: 82931087 num_examples: 96749 - name: orca_2 num_bytes: 19980410 num_examples: 17412 - name: instruct_2 num_bytes: 154000003 num_examples: 124814 - name: orca_3 num_bytes: 17101778 num_examples: 32038 - name: instruct_3 num_bytes: 49883928 num_examples: 63327 - name: orca_4 num_bytes: 31656753 num_examples: 34675 - name: instruct_4 num_bytes: 127695479 num_examples: 126005 - name: orca_5 num_bytes: 16269511 num_examples: 14092 - name: instruct_5 num_bytes: 61398228 num_examples: 59076 - name: orca_6 num_bytes: 1342860 num_examples: 2388 - name: instruct_6 num_bytes: 48450814 num_examples: 66011 - name: orca_7 num_bytes: 44849080 num_examples: 36172 - name: instruct_7 num_bytes: 65892068 num_examples: 59876 - name: orca_8 num_bytes: 19352268 num_examples: 18871 - name: instruct_8 num_bytes: 227627947 num_examples: 170841 - name: orca_9 num_bytes: 14700372 num_examples: 15315 - name: instruct_9 num_bytes: 64004683 num_examples: 60637 - name: orca_10 num_bytes: 508915 num_examples: 1446 - name: instruct_10 num_bytes: 24081225 num_examples: 48031 - name: orca_11 num_bytes: 19443068 num_examples: 19745 - name: instruct_11 num_bytes: 82438320 num_examples: 80868 - name: orca_12 num_bytes: 4848059 num_examples: 7172 - name: instruct_12 num_bytes: 166293672 num_examples: 182113 - name: orca_13 num_bytes: 10599648 num_examples: 19167 - name: instruct_13 num_bytes: 84060226 num_examples: 152834 - name: orca_14 num_bytes: 15987021 num_examples: 24048 - name: instruct_14 num_bytes: 59454799 num_examples: 91972 - name: orca_15 num_bytes: 23903599 num_examples: 24410 - name: instruct_15 num_bytes: 85555445 num_examples: 84953 - name: orca_16 num_bytes: 23154299 num_examples: 19289 - name: instruct_16 num_bytes: 101140401 num_examples: 90731 - name: orca_17 num_bytes: 2152082 num_examples: 3809 - name: instruct_17 num_bytes: 66472234 num_examples: 80386 - name: orca_18 num_bytes: 83273007 num_examples: 45544 - name: instruct_18 num_bytes: 110961860 num_examples: 80604 - name: orca_19 num_bytes: 1386401 num_examples: 1644 - name: instruct_19 num_bytes: 37424277 num_examples: 42630 - name: orca_20 num_bytes: 15212013 num_examples: 14602 - name: instruct_20 num_bytes: 94216681 num_examples: 77830 - name: orca_21 num_bytes: 3440922 num_examples: 4174 - name: instruct_21 num_bytes: 124095838 num_examples: 87012 - name: orca_22 num_bytes: 11468080 num_examples: 14191 - name: instruct_22 num_bytes: 63633991 num_examples: 78980 - name: orca_23 num_bytes: 3591049 num_examples: 3778 - name: instruct_23 num_bytes: 95699355 num_examples: 69680 - name: orca_24 num_bytes: 1309953 num_examples: 2395 - name: instruct_24 num_bytes: 82548064 num_examples: 92642 - name: orca_25 num_bytes: 20598114 num_examples: 18715 - name: instruct_25 num_bytes: 132539502 num_examples: 99843 - name: orca_26 num_bytes: 31638864 num_examples: 65463 - name: instruct_26 num_bytes: 52624322 num_examples: 81968 - name: orca_27 num_bytes: 3056079 num_examples: 5939 - name: instruct_27 num_bytes: 29071432 num_examples: 55864 - name: orca_28 num_bytes: 12158143 num_examples: 16039 - name: instruct_28 num_bytes: 67326019 num_examples: 84243 - name: orca_29 num_bytes: 33228880 num_examples: 65846 - name: instruct_29 num_bytes: 16788126 num_examples: 21536 - name: orca_30 num_bytes: 1580412 num_examples: 1991 - name: instruct_30 num_bytes: 15819978 num_examples: 29766 - name: orca_31 num_bytes: 6719191 num_examples: 11269 - name: instruct_31 num_bytes: 29009522 num_examples: 47163 download_size: 1412051638 dataset_size: 3109254774 --- # Dataset Card for "SkunkData-Corpus-Clusters-001" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
pharaouk
原始信息汇总

数据集概述

数据集配置

  • 配置名称: default
  • 数据文件路径:
    • orca_0: data/orca_0-*
    • instruct_0: data/instruct_0-*
    • orca_1: data/orca_1-*
    • instruct_1: data/instruct_1-*
    • orca_2: data/orca_2-*
    • instruct_2: data/instruct_2-*
    • orca_3: data/orca_3-*
    • instruct_3: data/instruct_3-*
    • orca_4: data/orca_4-*
    • instruct_4: data/instruct_4-*
    • orca_5: data/orca_5-*
    • instruct_5: data/instruct_5-*
    • orca_6: data/orca_6-*
    • instruct_6: data/instruct_6-*
    • orca_7: data/orca_7-*
    • instruct_7: data/instruct_7-*
    • orca_8: data/orca_8-*
    • instruct_8: data/instruct_8-*
    • orca_9: data/orca_9-*
    • instruct_9: data/instruct_9-*
    • orca_10: data/orca_10-*
    • instruct_10: data/instruct_10-*
    • orca_11: data/orca_11-*
    • instruct_11: data/instruct_11-*
    • orca_12: data/orca_12-*
    • instruct_12: data/instruct_12-*
    • orca_13: data/orca_13-*
    • instruct_13: data/instruct_13-*
    • orca_14: data/orca_14-*
    • instruct_14: data/instruct_14-*
    • orca_15: data/orca_15-*
    • instruct_15: data/instruct_15-*
    • orca_16: data/orca_16-*
    • instruct_16: data/instruct_16-*
    • orca_17: data/orca_17-*
    • instruct_17: data/instruct_17-*
    • orca_18: data/orca_18-*
    • instruct_18: data/instruct_18-*
    • orca_19: data/orca_19-*
    • instruct_19: data/instruct_19-*
    • orca_20: data/orca_20-*
    • instruct_20: data/instruct_20-*
    • orca_21: data/orca_21-*
    • instruct_21: data/instruct_21-*
    • orca_22: data/orca_22-*
    • instruct_22: data/instruct_22-*
    • orca_23: data/orca_23-*
    • instruct_23: data/instruct_23-*
    • orca_24: data/orca_24-*
    • instruct_24: data/instruct_24-*
    • orca_25: data/orca_25-*
    • instruct_25: data/instruct_25-*
    • orca_26: data/orca_26-*
    • instruct_26: data/instruct_26-*
    • orca_27: data/orca_27-*
    • instruct_27: data/instruct_27-*
    • orca_28: data/orca_28-*
    • instruct_28: data/instruct_28-*
    • orca_29: data/orca_29-*
    • instruct_29: data/instruct_29-*
    • orca_30: data/orca_30-*
    • instruct_30: data/instruct_30-*
    • orca_31: data/orca_31-*
    • instruct_31: data/instruct_31-*

数据集信息

  • 特征:

    • message: 字符串
    • message_type: 字符串
    • message_id: 64位整数
    • conversation_id: 64位整数
    • dataset_id: 字符串
    • unique_conversation_id: 字符串
    • cluster: 64位浮点数
    • __index_level_0__: 64位整数
  • 分割:

    • orca_0: 17849715 字节, 18401 个样本
    • instruct_0: 70074569 字节, 81024 个样本
    • orca_1: 23680133 字节, 28584 个样本
    • instruct_1: 82931087 字节, 96749 个样本
    • orca_2: 19980410 字节, 17412 个样本
    • instruct_2: 154000003 字节, 124814 个样本
    • orca_3: 17101778 字节, 32038 个样本
    • instruct_3: 49883928 字节, 63327 个样本
    • orca_4: 31656753 字节, 34675 个样本
    • instruct_4: 127695479 字节, 126005 个样本
    • orca_5: 16269511 字节, 14092 个样本
    • instruct_5: 61398228 字节, 59076 个样本
    • orca_6: 1342860 字节, 2388 个样本
    • instruct_6: 48450814 字节, 66011 个样本
    • orca_7: 44849080 字节, 36172 个样本
    • instruct_7: 65892068 字节, 59876 个样本
    • orca_8: 19352268 字节, 18871 个样本
    • instruct_8: 227627947 字节, 170841 个样本
    • orca_9: 14700372 字节, 15315 个样本
    • instruct_9: 64004683 字节, 60637 个样本
    • orca_10: 508915 字节, 1446 个样本
    • instruct_10: 24081225 字节, 48031 个样本
    • orca_11: 19443068 字节, 19745 个样本
    • instruct_11: 82438320 字节, 80868 个样本
    • orca_12: 4848059 字节, 7172 个样本
    • instruct_12: 166293672 字节, 182113 个样本
    • orca_13: 10599648 字节, 19167 个样本
    • instruct_13: 84060226 字节, 152834 个样本
    • orca_14: 15987021 字节, 24048 个样本
    • instruct_14: 59454799 字节, 91972 个样本
    • orca_15: 23903599 字节, 24410 个样本
    • instruct_15: 85555445 字节, 84953 个样本
    • orca_16: 23154299 字节, 19289 个样本
    • instruct_16: 101140401 字节, 90731 个样本
    • orca_17: 2152082 字节, 3809 个样本
    • instruct_17: 66472234 字节, 80386 个样本
    • orca_18: 83273007 字节, 45544 个样本
    • instruct_18: 110961860 字节, 80604 个样本
    • orca_19: 1386401 字节, 1644 个样本
    • instruct_19: 37424277 字节, 42630 个样本
    • orca_20: 15212013 字节, 14602 个样本
    • instruct_20: 94216681 字节, 77830 个样本
    • orca_21: 3440922 字节, 4174 个样本
    • instruct_21: 124095838 字节, 87012 个样本
    • orca_22: 11468080 字节, 14191 个样本
    • instruct_22: 63633991 字节, 78980 个样本
    • orca_23: 3591049 字节, 3778 个样本
    • instruct_23: 95699355 字节, 69680 个样本
    • orca_24: 1309953 字节, 2395 个样本
    • instruct_24: 82548064 字节, 92642 个样本
    • orca_25: 20598114 字节, 18715 个样本
    • instruct_25: 132539502 字节, 99843 个样本
    • orca_26: 31638864 字节, 65463 个样本
    • instruct_26: 52624322 字节, 81968 个样本
    • orca_27: 3056079 字节, 5939 个样本
    • instruct_27: 29071432 字节, 55864 个样本
    • orca_28: 12158143 字节, 16039 个样本
    • instruct_28: 67326019 字节, 84243 个样本
    • orca_29: 33228880 字节, 65846 个样本
    • instruct_29: 16788126 字节, 21536 个样本
    • orca_30: 1580412 字节, 1991 个样本
    • instruct_30: 15819978 字节, 29766 个样本
    • orca_31: 6719191 字节, 11269 个样本
    • instruct_31: 29009522 字节, 47163 个样本
  • 数据集大小:

    • 下载大小: 1412051638 字节
    • 数据集大小: 3109254774 字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作