five

NLU++ (NLLU++ : A Multi-Label, Slot-Rich, Generalisable Dataset for Natural Language Understanding in Task-Oriented Dialogue)

收藏
OpenDataLab2026-05-31 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/NLU_plus_plus
下载链接
链接失效反馈
官方服务:
资源简介:
nlu++ 是面向任务的对话 (ToD) 系统中的自然语言理解 (NLU) 数据集,旨在为对话 NLU 模型提供更具挑战性的评估环境,与当前的应用程序和行业需求保持同步。 nlu++ 分为两个领域(银行和酒店),对当前常用的 NLU 数据集进行了一些重要的改进。 1)Nlu++提供了具有大量具有挑战性的多意图句子的细粒度领域本体,引入并验证了意图模块的思想,可以组合成传达复杂用户目标的复杂意图,结合更细粒度从而更具挑战性插槽集。 2)本体分为领域特定和通用(即领域通用)意图模块,它们跨领域重叠,促进注释示例的跨领域可重用性。 3) 数据集设计受到工业 ToD 系统中观察到的问题的启发,4) 对话 NLU 专家对其进行收集、过滤和仔细注释,从而产生高质量的注释数据。数据集列表: 银行:在线银行查询及其相应的意图注释。跨度提取:用于 SpanConvert 论文的数据。 NLU++:对话 NLU 模型(多域、多标签意图和槽)的具有挑战性的评估环境。 EVI:一个具有挑战性的多语言数据集,用于口语对话系统中基于知识的注册、识别和识别。

NLU++ is a natural language understanding (NLU) dataset for task-oriented dialogue (ToD) systems, designed to provide a more challenging evaluation environment for conversational NLU models, aligned with current application and industry requirements. NLU++ covers two domains (banking and hospitality) and incorporates several critical improvements over currently widely used NLU datasets. 1) NLU++ provides fine-grained domain ontologies with a large number of challenging multi-intent sentences, introduces and validates the concept of intent modules that can be combined to form complex intents conveying sophisticated user goals, paired with more fine-grained and thus more challenging slot sets. 2) The ontologies are divided into domain-specific and general (i.e., domain-general) intent modules, which overlap across domains to facilitate cross-domain reusability of annotated examples. 3) The dataset design is inspired by issues observed in industrial ToD systems. 4) It is collected, filtered, and carefully annotated by conversational NLU experts, resulting in high-quality annotated data. Dataset List: Banking: Online banking inquiries and their corresponding intent annotations. Span Extraction: Data for the SpanConvert paper. NLU++: A challenging evaluation environment for conversational NLU models (multi-domain, multi-label intents and slots). EVI: A challenging multilingual dataset for knowledge-based registration, recognition and identification in spoken dialogue systems.
提供机构:
OpenDataLab
创建时间:
2022-09-01
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
NLU++是一个面向任务导向对话的自然语言理解数据集,涵盖银行和酒店两个领域,提供多标签意图和丰富槽位,以构建更具挑战性的评估环境。该数据集通过细粒度领域本体、跨领域可重用的意图模块,以及基于工业问题的高质量专家注释,旨在提升对话NLU模型的泛化能力。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务