five

google-research-datasets/schema_guided_dstc8

收藏
Hugging Face2024-01-18 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/google-research-datasets/schema_guided_dstc8
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - machine-generated language_creators: - crowdsourced - machine-generated language: - en license: - cc-by-sa-4.0 multilinguality: - monolingual size_categories: - 10K<n<100K source_datasets: - original task_categories: - text-generation - fill-mask - token-classification - text-classification task_ids: - dialogue-modeling - multi-class-classification - parsing paperswithcode_id: sgd pretty_name: Schema-Guided Dialogue dataset_info: - config_name: dialogues features: - name: dialogue_id dtype: string - name: services sequence: string - name: turns sequence: - name: speaker dtype: class_label: names: '0': USER '1': SYSTEM - name: utterance dtype: string - name: frames sequence: - name: service dtype: string - name: slots sequence: - name: slot dtype: string - name: start dtype: int32 - name: exclusive_end dtype: int32 - name: state struct: - name: active_intent dtype: string - name: requested_slots sequence: string - name: slot_values sequence: - name: slot_name dtype: string - name: slot_value_list sequence: string - name: actions sequence: - name: act dtype: class_label: names: '0': AFFIRM '1': AFFIRM_INTENT '2': CONFIRM '3': GOODBYE '4': INFORM '5': INFORM_COUNT '6': INFORM_INTENT '7': NEGATE '8': NEGATE_INTENT '9': NOTIFY_FAILURE '10': NOTIFY_SUCCESS '11': OFFER '12': OFFER_INTENT '13': REQUEST '14': REQUEST_ALTS '15': REQ_MORE '16': SELECT '17': THANK_YOU - name: slot dtype: string - name: canonical_values sequence: string - name: values sequence: string - name: service_results sequence: - name: service_results_list sequence: - name: service_slot_name dtype: string - name: service_canonical_value dtype: string - name: service_call struct: - name: method dtype: string - name: parameters sequence: - name: parameter_slot_name dtype: string - name: parameter_canonical_value dtype: string splits: - name: train num_bytes: 158452984 num_examples: 16142 - name: validation num_bytes: 23553544 num_examples: 2482 - name: test num_bytes: 41342956 num_examples: 4201 download_size: 617805368 dataset_size: 223349484 - config_name: schema features: - name: service_name dtype: string - name: description dtype: string - name: slots sequence: - name: name dtype: string - name: description dtype: string - name: is_categorical dtype: bool - name: possible_values sequence: string - name: intents sequence: - name: name dtype: string - name: description dtype: string - name: is_transactional dtype: bool - name: required_slots sequence: string - name: optional_slots sequence: - name: slot_name dtype: string - name: slot_value dtype: string - name: result_slots sequence: string splits: - name: train num_bytes: 31513 num_examples: 26 - name: validation num_bytes: 18798 num_examples: 17 - name: test num_bytes: 22487 num_examples: 21 download_size: 617805368 dataset_size: 72798 --- # Dataset Card for The Schema-Guided Dialogue Dataset ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Repository:** [Github repository for The Schema-Guided Dialogue Dataset](https://github.com/google-research-datasets/dstc8-schema-guided-dialogue) - **Paper:** [Towards Scalable Multi-Domain Conversational Agents: The Schema-Guided Dialogue Dataset](https://arxiv.org/abs/1909.05855) - **Point of Contact:** [abhirast@google.com](abhirast@google.com) ### Dataset Summary The Schema-Guided Dialogue dataset (SGD) was developed for the Dialogue State Tracking task of the Eights Dialogue Systems Technology Challenge (dstc8). The SGD dataset consists of over 18k annotated multi-domain, task-oriented conversations between a human and a virtual assistant. These conversations involve interactions with services and APIs spanning 17 domains, ranging from banks and events to media, calendar, travel, and weather. For most of these domains, the SGD dataset contains multiple different APIs, many of which have overlapping functionalities but different interfaces, which reflects common real-world scenarios. ### Supported Tasks and Leaderboards This dataset is designed to serve as an effective test-bed for intent prediction, slot filling, state tracking (i.e., estimating the user's goal) and language generation, among other tasks for large-scale virtual assistants: - **Generative dialogue modeling** or `dialogue-modeling`: the text of the dialogues can be used to train a sequence model on the utterances. Performance on this task is typically evaluated with delexicalized-[BLEU](https://huggingface.co/metrics/bleu), inform rate and request success. - **Intent state tracking**, a `multi-class-classification` task: predict the belief state of the user side of the conversation, performance is measured by [F1](https://huggingface.co/metrics/f1). - **Action prediction**, a `parsing` task: parse an utterance into the corresponding dialog acts for the system to use. [F1](https://huggingface.co/metrics/f1) is typically reported. ### Languages The text in the dataset is in English (`en`). ## Dataset Structure ### Data Instances - `dialogues` configuration (default): Each dialogue is represented as a sequence of turns, each containing a user or system utterance. The annotations for each turn are grouped into frames, where each frame corresponds to a single service. The annotations for user turns include the active intent, the dialogue state and slot spans for the different slots values mentioned in the turn. For system turns, we have the system actions representing the semantics of the system utterance. Each system action is represented using a dialogue act with optional parameters. - `schema` configuration: In addition to the dialogues, for each service used in the dataset, a normalized representation of the interface exposed is provided as the schema. The schema contains details like the name of the service, the list of tasks supported by the service (intents) and the attributes of the entities used by the service (slots). The schema also contains natural language descriptions of the service, intents and slots which can be used for developing models which can condition their predictions on the schema. ### Data Fields Each dialog instance has the following fields: - `dialogue_id`: A unique identifier for a dialogue. - `services`: A list of services present in the dialogue. - `turns`: A list of annotated system or user utterances. Each turn consists of the following fields: - `speaker`: The speaker for the turn. Either `USER` or `SYSTEM`. - `utterance`: A string containing the natural language utterance. - `frames`: A list of frames, each frame containing annotations for a single service and consists of the following fields: - `service`: The name of the service corresponding to the frame. The slots and intents used in the following fields are taken from the schema of this service. - `slots`: A list of slot spans in the utterance, only provided for non-categorical slots. Each slot span contains the following fields: - `slot`: The name of the slot. - `start`: The index of the starting character in the utterance corresponding to the slot value. - `exclusive_end`: The index of the character just after the last character corresponding to the slot value in the utterance. - `actions`: A list of actions corresponding to the system. Each action has the following fields: - `act`: The type of action. - `slot`: (optional) A slot argument for some of the actions. - `values`: (optional) A list of values assigned to the slot. If the values list is non-empty, then the slot must be present. - `canonical_values`: (optional) The values in their canonicalized form as used by the service. It is a list of strings of the same length as values. - `service_call`: (system turns only, optional) The request sent to the service. It consists of the following fields: - `method`: The name of the intent or function of the service or API being executed. - `parameters`: A pair of lists of the same lengths: `parameter_slot_name` contains slot names and `parameter_canonical_value` contains the corresponding values in their canonicalized form. - `service_results`: (system turns only, optional) A list of entities containing the results obtained from the service. It is only available for turns in which a service call is made. Each entity is represented as a pair of lists of the same length: `service_slot_name` contains slot names and `service_canonical_value` contains the corresponding canonical values. - `state`: (user turns only) The dialogue state corresponding to the service. It consists of the following fields: - `active_intent`: The intent corresponding to the service of the frame which is currently being fulfilled by the system. It takes the value "NONE" if none of the intents are active. - `requested_slots`: A list of slots requested by the user in the current turn. - `slot_values`: A pair of lists of the same lengths: `slot_name` contains slot names and `slot_value_list` contains the corresponding lists of strings. For categorical slots, this list contains a single value assigned to the slot. For non-categorical slots, all the values in this list are spoken variations of each other and are equivalent (e.g, "6 pm", "six in the evening", "evening at 6" etc.). The mapping from the action ID and the action name is the following: 0: AFFIRM 1: AFFIRM_INTENT 2: CONFIRM 3: GOODBYE 4: INFORM 5: INFORM_COUNT 6: INFORM_INTENT 7: NEGATE 8: NEGATE_INTENT 9: NOTIFY_FAILURE 10: NOTIFY_SUCCESS 11: OFFER 12: OFFER_INTENT 13: REQUEST 14: REQUEST_ALTS 15: REQ_MORE 16: SELECT 17: THANK_YOU ### Data Splits The dataset is split into a `train`, `validation`, and `test` split with the following sizes: | | train | validation | test | |---------------------|------:|-----------:|------:| | Number of dialogues | 16142 | 2482 | 4201 | | Number of turns | 48426 | 7446 | 12603 | ## Dataset Creation ### Curation Rationale The data was collected by first using a dialogue simulator to generate dialogue outlines first and then paraphrasing them to obtain natural utterances. Using a dialogue simulator ensures the coverage of a large variety of dialogue flows by filtering out similar flows in the simulation phase to create a diverse dataset, and dialogues can be generated with their annotation, as opposed to a Wizard-of-Oz setup which is prone to manual annotation errors. ### Source Data #### Initial Data Collection and Normalization The dialogue outlines are first generated by a simulator. The dialogue simulator interacts with the services to generate dialogue outlines. It consists of two agents playing the roles of the user and the system, interacting with each other using a finite set of actions specified through dialogue acts over a probabilistic automaton designed to capture varied dialogue trajectories. It is worth noting that the simulation automaton does not include any domain-specific constraints: all domain-specific constraints are encoded in the schema and scenario. The dialogue paraphrasing framework then converts the outlines generated by the simulator into a natural conversation. Users may refer to the slot values in the dialogue acts in various different ways during the conversation, e.g., “los angeles” may be referred to as “LA” or “LAX”. To introduce these natural variations in the slot values, different slot values are replaced with a randomly selected variation while being kept consistent across user turns in a dialogue. The actions are then converted to pseudo-natural language utterances using a set of manually defined action-to-text templates, and the resulting utterances for the different actions in a turn are concatenated together. Finally, the dialogue transformed by these steps is sent to the crowd workers to be reformulated into more natural language. One crowd worker is tasked with paraphrasing all utterances of a dialogue to ensure naturalness and coherence. The crowd workers are asked to exactly repeat the slot values in their paraphrases so that the span indices for the slots can be recovered via string matching. #### Who are the source language producers? The language structure is machine-generated, and the language realizations are produced by crowd workers. The dataset paper does not provide demographic information for the crowd workers. ### Annotations #### Annotation process The annotations are automatically obtained during the initial sampling process and by string matching after reformulation. #### Who are the annotators? [N/A] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators The dataset was created by a team of researchers working at Google Mountain View. ### Licensing Information The dataset is released under CC BY-SA 4.0 license. ### Citation Information For the DSCT8 task, please cite: ``` @article{corr/abs-2002-01359, author = {Abhinav Rastogi and Xiaoxue Zang and Srinivas Sunkara and Raghav Gupta and Pranav Khaitan}, title = {Schema-Guided Dialogue State Tracking Task at {DSTC8}}, journal = {CoRR}, volume = {abs/2002.01359}, year = {2020}, url = {https://arxiv.org/abs/2002.01359}, archivePrefix = {arXiv}, eprint = {2002.01359} } ``` For the initial release paper please cite: ``` @inproceedings{aaai/RastogiZSGK20, author = {Abhinav Rastogi and Xiaoxue Zang and Srinivas Sunkara and Raghav Gupta and Pranav Khaitan}, title = {Towards Scalable Multi-Domain Conversational Agents: The Schema-Guided Dialogue Dataset}, booktitle = {The Thirty-Fourth {AAAI} Conference on Artificial Intelligence, {AAAI} 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, {IAAI} 2020, The Tenth {AAAI} Symposium on Educational Advances in Artificial Intelligence, {EAAI} 2020, New York, NY, USA, February 7-12, 2020}, pages = {8689--8696}, publisher = {{AAAI} Press}, year = {2020}, url = {https://aaai.org/ojs/index.php/AAAI/article/view/6394} } ``` ### Contributions Thanks to [@yjernite](https://github.com/yjernite) for adding this dataset.
提供机构:
google-research-datasets
原始信息汇总

数据集卡片:Schema-Guided Dialogue 数据集

数据集描述

数据集摘要

Schema-Guided Dialogue 数据集(SGD)是为第八届对话系统技术挑战赛(dstc8)的对话状态跟踪任务开发的。该数据集包含超过18k个注释的多领域、任务导向的人机对话。这些对话涉及与17个领域的服务和API的交互,从银行和事件到媒体、日历、旅行和天气等。对于大多数这些领域,SGD数据集包含多个不同的API,许多API具有重叠的功能但不同的接口,反映了常见的现实世界场景。

支持的任务和排行榜

该数据集旨在作为大规模虚拟助手意图预测、槽填充、状态跟踪(即估计用户的目标)和语言生成等任务的有效测试平台:

  • 生成对话建模dialogue-modeling:对话的文本可用于训练序列模型。该任务通常使用去词汇化-BLEU、通知率和请求成功率进行评估。
  • 意图状态跟踪,一个multi-class-classification任务:预测对话中用户方的信念状态,性能通过F1衡量。
  • 动作预测,一个parsing任务:将话语解析为相应的对话行为供系统使用。通常报告F1

语言

数据集中的文本为英语(en)。

数据集结构

数据实例

  • dialogues配置(默认):每个对话表示为一系列轮次,每个轮次包含用户或系统的 utterance。每个轮次的注释分组到帧中,每个帧对应一个服务。用户轮次的注释包括活动意图、对话状态和不同槽值的槽跨度。对于系统轮次,我们有表示系统话语语义的系统动作。每个系统动作使用带有可选参数的对话行为表示。
  • schema配置:除了对话之外,对于数据集中使用的每个服务,提供了一个规范化的接口表示作为模式。该模式包含服务的名称、服务支持的任务列表(意图)以及服务使用的实体的属性(槽)。该模式还包含服务的自然语言描述,意图和槽,可用于开发基于模式的预测模型。

数据字段

每个对话实例具有以下字段:

  • dialogue_id:对话的唯一标识符。
  • services:对话中存在的服务列表。
  • turns:系统或用户 utterance 的注释列表。每个轮次包含以下字段:
    • speaker:轮次的说话者。USERSYSTEM
    • utterance:包含自然语言 utterance 的字符串。
    • frames:帧列表,每个帧包含单个服务的注释,并包含以下字段:
      • service:对应帧的服务名称。以下字段中使用的槽和意图取自该服务的模式。
      • slots:话语中槽跨度的列表,仅提供非分类槽。每个槽跨度包含以下字段:
        • slot:槽的名称。
        • start:对应槽值的 utterance 中起始字符的索引。
        • exclusive_end:对应槽值的 utterance 中最后一个字符之后的字符索引。
      • actions:对应系统的动作列表。每个动作包含以下字段:
        • act:动作类型。
        • slot:(可选)某些动作的槽参数。
        • values:(可选)分配给槽的值列表。如果值列表非空,则槽必须存在。
        • canonical_values:(可选)服务中使用的规范形式的值。它是一个与值长度相同的字符串列表。
      • service_call:(仅系统轮次,可选)发送给服务的请求。它包含以下字段:
        • method:服务或API执行的意图或函数的名称。
        • parameters:长度相同的一对列表:parameter_slot_name包含槽名称,parameter_canonical_value包含相应规范形式的值。
      • service_results:(仅系统轮次,可选)从服务获得的结果列表。仅在执行服务调用的轮次中可用。每个实体表示为长度相同的一对列表:service_slot_name包含槽名称,service_canonical_value包含相应规范形式的值。
      • state:(仅用户轮次)对应服务的对话状态。它包含以下字段:
        • active_intent:当前由系统满足的帧服务的意图。如果没有活动意图,则取值为“NONE”。
        • requested_slots:当前轮次中用户请求的槽列表。
        • slot_values:长度相同的一对列表:slot_name包含槽名称,slot_value_list包含相应字符串列表。对于分类槽,此列表包含分配给槽的单个值。对于非分类槽,此列表中的所有值都是彼此的口语变体且等效(例如,“6 pm”,“六点晚上”,“晚上六点”等)。

动作ID和动作名称的映射如下:

0: AFFIRM 1: AFFIRM_INTENT 2: CONFIRM 3: GOODBYE 4: INFORM 5: INFORM_COUNT 6: INFORM_INTENT 7: NEGATE 8: NEGATE_INTENT 9: NOTIFY_FAILURE 10: NOTIFY_SUCCESS 11: OFFER 12: OFFER_INTENT 13: REQUEST 14: REQUEST_ALTS 15: REQ_MORE 16: SELECT 17: THANK_YOU

数据分割

数据集分为trainvalidationtest分割,大小如下:

train validation test
对话数量 16142 2482 4201
轮次数量 48426 7446 12603

数据集创建

策划理由

数据首先使用对话模拟器生成对话大纲,然后对其进行改写以获得自然 utterance。使用对话模拟器确保了通过在模拟阶段过滤掉类似流程来创建多样化数据集的大量对话流程覆盖,并且对话可以在其注释的情况下生成,而不是Wizard-of-Oz设置,后者容易出现手动注释错误。

源数据

初始数据收集和规范化

对话大纲首先由模拟器生成。对话模拟器与服务交互以生成对话大纲。它由两个代理扮演用户和系统的角色,使用通过对话行为在概率自动机上指定的有限动作集相互交互,该自动机旨在捕捉多样化的对话轨迹。值得注意的是,模拟自动机不包括任何特定于领域的约束:所有特定于领域的约束都编码在模式和场景中。

然后,对话改写框架将模拟器生成的大纲转换为自然对话。用户可能在对话行为中以各种不同的方式引用槽值,例如,“洛杉矶”可能被称为“LA”或“LAX”。为了在槽值中引入这些自然变化,不同的槽值被替换为随机选择的变体,同时在对话中的用户轮次中保持一致。然后,使用一组手动定义的动作到文本模板将动作转换为伪自然语言 utterance,并将不同动作的 utterance 连接在一起。

最后,通过这些步骤转换的对话被发送给众包工作者,以改写为更自然的语言。一个众包工作者负责改写对话的所有 utterance,以确保自然性和连贯性。众包工作者被要求在其改写中完全重复槽值,以便可以通过字符串匹配恢复槽的跨度索引。

源语言生产者是谁?

语言结构是机器生成的,语言实现由众包工作者产生。数据集论文未提供众包工作者的 demographic 信息。

注释

注释过程

注释在初始采样过程中自动获得,并在改写后通过字符串匹配获得。

注释者是谁?

[N/A]

个人和敏感信息

[更多信息需要]

使用数据集的注意事项

数据集的社会影响

[更多信息需要]

偏见的讨论

[更多信息需要]

其他已知限制

[更多信息需要]

附加信息

数据集策展人

数据集由在 Google Mountain View 工作的一组研究人员创建。

许可信息

数据集在 CC BY-SA 4.0 许可下发布。

引用信息

对于 DSCT8 任务,请引用:

@article{corr/abs-2002-01359, author = {Abhinav Rastogi and Xiaoxue Zang and Srinivas Sunkara and Raghav Gupta and Pranav Khaitan}, title = {Schema-Guided Dialogue State Tracking Task at {DSTC8}}, journal = {CoRR}, volume = {abs/2002.01359}, year = {2020}, url = {https://arxiv.org/abs/2002.01359}, archivePrefix = {arXiv}, eprint = {2002.01359} }

对于初始发布论文,请引用:

@inproceedings{aaai/RastogiZSGK20, author = {Abhinav Rastogi and Xiaoxue Zang and Srinivas Sunkara and Raghav Gupta and Pranav Khaitan}, title = {Towards Scalable Multi-Domain Conversational Agents: The Schema-Guided Dialogue Dataset}, booktitle = {The Thirty-Fourth {AAAI} Conference on Artificial Intelligence, {AAAI} 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, {IAAI} 2020, The Tenth {AAAI} Symposium on Educational Advances in Artificial Intelligence, {EAAI} 2020, New York, NY, USA, February 7-12, 2020}, pages = {8689--8696}, publisher = {{AAAI} Press}, year = {2020}, url = {https://aaai.org/ojs/index.php/AAAI/article/view/6394} }

贡献

感谢@yjernite添加此数据集。

搜集汇总
数据集介绍
main_image_url
构建方式
Schema-Guided Dialogue数据集的构建采用了一种自动化与人工干预相结合的方法。首先通过对话模拟器生成对话大纲,再通过人工众包的方式将这些大纲转化为自然的对话。对话模拟器中的两个代理(用户和系统)使用一组预定义的对话动作进行交互,生成多样化的对话轨迹。随后,通过对话范式转换框架,将动作转化为伪自然语言表达,并保持槽值的一致性。最终由众包工作者对生成的对话进行自然语言重写,确保对话的自然性和连贯性。
特点
该数据集的特点在于其多领域、任务导向的对话,涵盖了17个不同领域的服务和API,如银行、活动、媒体、日历、旅游和天气等。数据集不仅包含对话本身,还提供了每个服务的标准化接口描述(schema),包括服务名称、支持的意图和实体属性等。此外,数据集的构建注重多样性和覆盖面,通过对话模拟器生成多样化的对话流程,避免了类似对话流程的重复,确保了数据集的丰富性。
使用方法
使用Schema-Guided Dialogue数据集时,研究者可以依据其提供的对话实例和注释,进行意图预测、槽位填充、状态跟踪和语言生成等任务的训练和评估。数据集分为训练集、验证集和测试集,方便研究者进行模型的训练和性能评估。此外,数据集的schema信息可以帮助模型更好地理解服务接口和对话上下文,从而提高多领域对话系统的性能。
背景与挑战
背景概述
Schema-Guided Dialogue数据集(简称SGD)是由Google Mountain View的研究团队开发的,旨在为对话状态跟踪任务提供支持,该任务属于第八届对话系统技术挑战(dstc8)。SGD数据集包含了超过18k个人类与虚拟助手之间的多领域、任务导向的对话记录。这些对话涉及17个不同领域的服务与API,包括银行、活动、媒体、日历、旅行和天气等。SGD数据集特别之处在于,它提供了服务的标准化表示,即schema,这包含了服务名称、支持的任务(意图)以及实体属性(槽位)等详细信息。schema中的自然语言描述可以用来训练能够根据schema进行预测的模型。
当前挑战
SGD数据集在构建过程中面临的挑战主要包括:如何通过模拟器和众包工作相结合的方式生成既覆盖广泛对话流程又不失自然性的对话记录;如何确保在模拟过程中生成的对话能够准确反映真实场景中的对话状态;以及如何处理数据集中可能存在的个人和敏感信息问题。此外,数据集的多领域特性也带来了标注和模型泛化的挑战。在实际应用中,如何利用schema信息提高对话系统的泛化能力和鲁棒性,以及如何处理可能存在的偏见和局限性,都是当前和未来的研究重点。
常用场景
经典使用场景
Schema-Guided Dialogue数据集是针对多领域任务型对话系统而设计,其经典使用场景在于训练对话状态跟踪模型。该数据集通过提供丰富的对话实例,使得模型能够理解和预测用户意图、对话状态以及相关槽位信息,从而在虚拟助手与用户之间构建有效的沟通桥梁。
解决学术问题
该数据集解决了学术研究中如何构建大规模、多样化的任务型对话系统的问题。通过引入服务模式和服务调用结果,它为研究者提供了一个可以评估和改进对话状态跟踪、意图预测、槽位填充等任务的平台,极大地推动了多领域对话系统的研究进展。
衍生相关工作
基于Schema-Guided Dialogue数据集,研究者们衍生出了多项相关工作,包括但不限于对话生成模型、对话状态跟踪算法以及多轮对话理解方法。这些研究进一步拓展了数据集的应用范围,为对话系统的创新和优化提供了有力支撑。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作