ConvLab/metalwoz

Name: ConvLab/metalwoz
Creator: ConvLab
Published: 2022-11-25 09:11:36
License: 暂无描述

Hugging Face2022-11-25 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/ConvLab/metalwoz

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: [] multilinguality: - monolingual pretty_name: MetaLWOZ size_categories: - 10K<n<100K task_categories: - conversational --- # Dataset Card for MetaLWOZ - **Repository:** https://www.microsoft.com/en-us/research/project/metalwoz/ - **Paper:** https://www.microsoft.com/en-us/research/publication/results-of-the-multi-domain-task-completion-dialog-challenge/ - **Leaderboard:** None - **Who transforms the dataset:** Qi Zhu(zhuq96 at gmail dot com) To use this dataset, you need to install [ConvLab-3](https://github.com/ConvLab/ConvLab-3) platform first. Then you can load the dataset via: ``` from convlab.util import load_dataset, load_ontology, load_database dataset = load_dataset('metalwoz') ontology = load_ontology('metalwoz') database = load_database('metalwoz') ``` For more usage please refer to [here](https://github.com/ConvLab/ConvLab-3/tree/master/data/unified_datasets). ### Dataset Summary This large dataset was created by crowdsourcing 37,884 goal-oriented dialogs, covering 227 tasks in 47 domains. Domains include bus schedules, apartment search, alarm setting, banking, and event reservation. Each dialog was grounded in a scenario with roles, pairing a person acting as the bot and a person acting as the user. (This is the Wizard of Oz reference—using people behind the curtain who act as the machine). Each pair were given a domain and a task, and instructed to converse for 10 turns to satisfy the user’s queries. For example, if a user asked if a bus stop was operational, the bot would respond that the bus stop had been moved two blocks north, which starts a conversation that addresses the user’s actual need. - **How to get the transformed data from original data:** - Download [metalwoz-v1.zip](https://www.microsoft.com/en-us/download/58389) and [metalwoz-test-v1.zip](https://www.microsoft.com/en-us/download/100639). - Run `python preprocess.py` in the current directory. - **Main changes of the transformation:** - `CITI_INFO`, `HOME_BOT`, `NAME_SUGGESTER`, and `TIME_ZONE` are randomly selected as the valiation domains. - Remove the first utterance by the system since it is "Hello how may I help you?" in most case. - Add goal description according to the original task description: user_role+user_prompt+system_role+system_prompt. - **Annotations:** - domain, goal ### Supported Tasks and Leaderboards RG, User simulator ### Languages English ### Data Splits | split | dialogues | utterances | avg_utt | avg_tokens | avg_domains | cat slot match(state) | cat slot match(goal) | cat slot match(dialogue act) | non-cat slot span(dialogue act) | |------------|-------------|--------------|-----------|--------------|---------------|-------------------------|------------------------|--------------------------------|-----------------------------------| | train | 34261 | 357092 | 10.42 | 7.48 | 1 | - | - | - | - | | validation | 3623 | 37060 | 10.23 | 6.59 | 1 | - | - | - | - | | test | 2319 | 23882 | 10.3 | 7.96 | 1 | - | - | - | - | | all | 40203 | 418034 | 10.4 | 7.43 | 1 | - | - | - | - | 51 domains: ['AGREEMENT_BOT', 'ALARM_SET', 'APARTMENT_FINDER', 'APPOINTMENT_REMINDER', 'AUTO_SORT', 'BANK_BOT', 'BUS_SCHEDULE_BOT', 'CATALOGUE_BOT', 'CHECK_STATUS', 'CITY_INFO', 'CONTACT_MANAGER', 'DECIDER_BOT', 'EDIT_PLAYLIST', 'EVENT_RESERVE', 'GAME_RULES', 'GEOGRAPHY', 'GUINESS_CHECK', 'HOME_BOT', 'HOW_TO_BASIC', 'INSURANCE', 'LIBRARY_REQUEST', 'LOOK_UP_INFO', 'MAKE_RESTAURANT_RESERVATIONS', 'MOVIE_LISTINGS', 'MUSIC_SUGGESTER', 'NAME_SUGGESTER', 'ORDER_PIZZA', 'PET_ADVICE', 'PHONE_PLAN_BOT', 'PHONE_SETTINGS', 'PLAY_TIMES', 'POLICY_BOT', 'PRESENT_IDEAS', 'PROMPT_GENERATOR', 'QUOTE_OF_THE_DAY_BOT', 'RESTAURANT_PICKER', 'SCAM_LOOKUP', 'SHOPPING', 'SKI_BOT', 'SPORTS_INFO', 'STORE_DETAILS', 'TIME_ZONE', 'UPDATE_CALENDAR', 'UPDATE_CONTACT', 'WEATHER_CHECK', 'WEDDING_PLANNER', 'WHAT_IS_IT', 'BOOKING_FLIGHT', 'HOTEL_RESERVE', 'TOURISM', 'VACATION_IDEAS'] - **cat slot match**: how many values of categorical slots are in the possible values of ontology in percentage. - **non-cat slot span**: how many values of non-categorical slots have span annotation in percentage. ### Citation ``` @inproceedings{li2020results, author = {Li, Jinchao and Peng, Baolin and Lee, Sungjin and Gao, Jianfeng and Takanobu, Ryuichi and Zhu, Qi and Minlie Huang and Schulz, Hannes and Atkinson, Adam and Adada, Mahmoud}, title = {Results of the Multi-Domain Task-Completion Dialog Challenge}, booktitle = {Proceedings of the 34th AAAI Conference on Artificial Intelligence, Eighth Dialog System Technology Challenge Workshop}, year = {2020}, month = {February}, url = {https://www.microsoft.com/en-us/research/publication/results-of-the-multi-domain-task-completion-dialog-challenge/}, } ``` ### Licensing Information [Microsoft Research Data License Agreement](https://msropendata-web-api.azurewebsites.net/licenses/2f933be3-284d-500b-7ea3-2aa2fd0f1bb2/view)

提供机构：

ConvLab

原始信息汇总

数据集概述

名称: MetaLWOZ
语言: 英语
大小: 10K<n<100K
任务类型: 对话式
数据集创建方式: 通过众包方式创建，包含37,884个目标导向的对话，覆盖47个领域的227个任务。

数据集内容

领域: 包括公交时刻表、公寓搜索、闹钟设置、银行业务、事件预订等47个领域。
对话结构: 每个对话基于一个场景，由两个人扮演，一人扮演机器人，另一人扮演用户，进行10轮对话以满足用户查询。

数据处理

原始数据转换:
- 下载metalwoz-v1.zip和metalwoz-test-v1.zip。
- 运行python preprocess.py进行数据预处理。
主要变化:
- 随机选择CITI_INFO, HOME_BOT, NAME_SUGGESTER, TIME_ZONE作为验证域。
- 移除系统的第一个发言，通常为“Hello how may I help you?”。
- 根据原始任务描述添加目标描述。

数据集分割

分割	对话数	发言数	平均发言数	平均令牌数	平均领域数
训练	34261	357092	10.42	7.48	1
验证	3623	37060	10.23	6.59	1
测试	2319	23882	10.3	7.96	1
全部	40203	418034	10.4	7.43	1

引用信息

@inproceedings{li2020results, author = {Li, Jinchao and Peng, Baolin and Lee, Sungjin and Gao, Jianfeng and Takanobu, Ryuichi and Zhu, Qi and Minlie Huang and Schulz, Hannes and Atkinson, Adam and Adada, Mahmoud}, title = {Results of the Multi-Domain Task-Completion Dialog Challenge}, booktitle = {Proceedings of the 34th AAAI Conference on Artificial Intelligence, Eighth Dialog System Technology Challenge Workshop}, year = {2020}, month = {February}, url = {https://www.microsoft.com/en-us/research/publication/results-of-the-multi-domain-task-completion-dialog-challenge/}, }

许可信息

Microsoft Research Data License Agreement

5,000+

优质数据集

54 个

任务类型

进入经典数据集