five

Brendan/multiwoz_turns_v22

收藏
Hugging Face2023-11-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Brendan/multiwoz_turns_v22
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* - split: valid_20p_ablation path: data/valid_20p_ablation-* - split: valid_10p path: data/valid_10p-* - split: valid_50p path: data/valid_50p-* - split: 1p_train_v1 path: data/1p_train_v1-* - split: 1p_train_v2 path: data/1p_train_v2-* - split: 1p_train_v3 path: data/1p_train_v3-* - split: 5p_train_v1 path: data/5p_train_v1-* - split: 5p_train_v2 path: data/5p_train_v2-* - split: 5p_train_v3 path: data/5p_train_v3-* - split: 10p_train_v1 path: data/10p_train_v1-* - split: 10p_train_v2 path: data/10p_train_v2-* - split: 10p_train_v3 path: data/10p_train_v3-* - split: train_evaluable_only path: data/train_evaluable_only-* - split: valid_evaluable_only path: data/valid_evaluable_only-* dataset_info: features: - name: dialogue_id dtype: string - name: turn_id dtype: int8 - name: domains sequence: string - name: system_utterances sequence: string - name: user_utterances sequence: string - name: slot_values struct: - name: hotel struct: - name: price range dtype: string - name: type dtype: string - name: parking dtype: string - name: book day dtype: string - name: book people dtype: string - name: book stay dtype: string - name: stars dtype: string - name: internet dtype: string - name: name dtype: string - name: area dtype: string - name: train struct: - name: arrive by dtype: string - name: departure dtype: string - name: day dtype: string - name: book people dtype: string - name: leave at dtype: string - name: destination dtype: string - name: attraction struct: - name: area dtype: string - name: name dtype: string - name: type dtype: string - name: restaurant struct: - name: price range dtype: string - name: area dtype: string - name: food dtype: string - name: name dtype: string - name: book day dtype: string - name: book people dtype: string - name: book time dtype: string - name: hospital struct: - name: department dtype: string - name: taxi struct: - name: leave at dtype: string - name: destination dtype: string - name: departure dtype: string - name: arrive by dtype: string - name: bus struct: - name: departure dtype: string - name: destination dtype: string - name: leave at dtype: string - name: day dtype: string - name: police struct: - name: name dtype: string - name: turn_slot_values struct: - name: hotel struct: - name: price range dtype: string - name: type dtype: string - name: parking dtype: string - name: book day dtype: string - name: book people dtype: string - name: book stay dtype: string - name: stars dtype: string - name: internet dtype: string - name: name dtype: string - name: area dtype: string - name: train struct: - name: arrive by dtype: string - name: departure dtype: string - name: day dtype: string - name: book people dtype: string - name: leave at dtype: string - name: destination dtype: string - name: attraction struct: - name: area dtype: string - name: name dtype: string - name: type dtype: string - name: restaurant struct: - name: price range dtype: string - name: area dtype: string - name: food dtype: string - name: name dtype: string - name: book day dtype: string - name: book people dtype: string - name: book time dtype: string - name: hospital struct: - name: department dtype: string - name: taxi struct: - name: leave at dtype: string - name: destination dtype: string - name: departure dtype: string - name: arrive by dtype: string - name: bus struct: - name: departure dtype: string - name: destination dtype: string - name: leave at dtype: string - name: day dtype: string - name: police struct: - name: name dtype: string - name: last_slot_values struct: - name: hotel struct: - name: price range dtype: string - name: type dtype: string - name: parking dtype: string - name: book day dtype: string - name: book people dtype: string - name: book stay dtype: string - name: stars dtype: string - name: internet dtype: string - name: name dtype: string - name: area dtype: string - name: train struct: - name: arrive by dtype: string - name: departure dtype: string - name: day dtype: string - name: book people dtype: string - name: leave at dtype: string - name: destination dtype: string - name: attraction struct: - name: area dtype: string - name: name dtype: string - name: type dtype: string - name: restaurant struct: - name: price range dtype: string - name: area dtype: string - name: food dtype: string - name: name dtype: string - name: book day dtype: string - name: book people dtype: string - name: book time dtype: string - name: hospital struct: - name: department dtype: string - name: taxi struct: - name: leave at dtype: string - name: destination dtype: string - name: departure dtype: string - name: arrive by dtype: string - name: bus struct: - name: departure dtype: string - name: destination dtype: string - name: leave at dtype: string - name: day dtype: string - name: police struct: - name: name dtype: string - name: last_system_response_acts sequence: string - name: system_response_acts sequence: string - name: system_response dtype: string splits: - name: train num_bytes: 84139088 num_examples: 56776 - name: validation num_bytes: 11271758 num_examples: 7374 - name: test num_bytes: 11295224 num_examples: 7372 - name: valid_20p_ablation num_bytes: 2273000.2910225117 num_examples: 1487 - name: valid_10p num_bytes: 1114335.7176566315 num_examples: 729 - name: valid_50p num_bytes: 5667979.2058584215 num_examples: 3708 - name: 1p_train_v1 num_bytes: 798770.0512892772 num_examples: 539 - name: 1p_train_v2 num_bytes: 890650.8364097506 num_examples: 601 - name: 1p_train_v3 num_bytes: 861011.8734676624 num_examples: 581 - name: 5p_train_v1 num_bytes: 4245781.441454136 num_examples: 2865 - name: 5p_train_v2 num_bytes: 4103514.419332112 num_examples: 2769 - name: 5p_train_v3 num_bytes: 4220588.32295336 num_examples: 2848 - name: 10p_train_v1 num_bytes: 8368561.186698605 num_examples: 5647 - name: 10p_train_v2 num_bytes: 8447104.438495139 num_examples: 5700 - name: 10p_train_v3 num_bytes: 8398200.149640692 num_examples: 5667 - name: train_evaluable_only num_bytes: 83498886.4004509 num_examples: 56344 - name: valid_evaluable_only num_bytes: 11261057.931380527 num_examples: 7367 download_size: 39840521 dataset_size: 250855512.26610973 --- # Dataset Card for "multiwoz_turns_v22" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
Brendan
原始信息汇总

数据集概述

数据集配置

  • 默认配置
    • 训练集:路径为 data/train-*
    • 验证集:路径为 data/validation-*
    • 测试集:路径为 data/test-*
    • 其他分割
      • valid_20p_ablation:路径为 data/valid_20p_ablation-*
      • valid_10p:路径为 data/valid_10p-*
      • valid_50p:路径为 data/valid_50p-*
      • 1p_train_v1:路径为 data/1p_train_v1-*
      • 1p_train_v2:路径为 data/1p_train_v2-*
      • 1p_train_v3:路径为 data/1p_train_v3-*
      • 5p_train_v1:路径为 data/5p_train_v1-*
      • 5p_train_v2:路径为 data/5p_train_v2-*
      • 5p_train_v3:路径为 data/5p_train_v3-*
      • 10p_train_v1:路径为 data/10p_train_v1-*
      • 10p_train_v2:路径为 data/10p_train_v2-*
      • 10p_train_v3:路径为 data/10p_train_v3-*
      • train_evaluable_only:路径为 data/train_evaluable_only-*
      • valid_evaluable_only:路径为 data/valid_evaluable_only-*

数据集特征

  • 基本特征
    • dialogue_id:对话ID,类型为字符串
    • turn_id:回合ID,类型为整数
    • domains:领域,类型为字符串序列
    • system_utterances:系统话语,类型为字符串序列
    • user_utterances:用户话语,类型为字符串序列
    • slot_values:槽值,类型为结构体,包含多个领域及其对应的槽值
    • turn_slot_values:回合槽值,类型为结构体,包含多个领域及其对应的槽值
    • last_slot_values:上一回合槽值,类型为结构体,包含多个领域及其对应的槽值
    • last_system_response_acts:上一系统响应动作,类型为字符串序列
    • system_response_acts:系统响应动作,类型为字符串序列
    • system_response:系统响应,类型为字符串

数据集分割

  • 训练集
    • 字节数:84139088
    • 样本数:56776
  • 验证集
    • 字节数:11271758
    • 样本数:7374
  • 测试集
    • 字节数:11295224
    • 样本数:7372
  • 其他分割
    • valid_20p_ablation
      • 字节数:2273000.2910225117
      • 样本数:1487
    • valid_10p
      • 字节数:1114335.7176566315
      • 样本数:729
    • valid_50p
      • 字节数:5667979.2058584215
      • 样本数:3708
    • 1p_train_v1
      • 字节数:798770.0512892772
      • 样本数:539
    • 1p_train_v2
      • 字节数:890650.8364097506
      • 样本数:601
    • 1p_train_v3
      • 字节数:861011.8734676624
      • 样本数:581
    • 5p_train_v1
      • 字节数:4245781.441454136
      • 样本数:2865
    • 5p_train_v2
      • 字节数:4103514.419332112
      • 样本数:2769
    • 5p_train_v3
      • 字节数:4220588.32295336
      • 样本数:2848
    • 10p_train_v1
      • 字节数:8368561.186698605
      • 样本数:5647
    • 10p_train_v2
      • 字节数:8447104.438495139
      • 样本数:5700
    • 10p_train_v3
      • 字节数:8398200.149640692
      • 样本数:5667
    • train_evaluable_only
      • 字节数:83498886.4004509
      • 样本数:56344
    • valid_evaluable_only
      • 字节数:11261057.931380527
      • 样本数:7367

数据集大小

  • 下载大小:39840521 字节
  • 数据集大小:250855512.26610973 字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作