NLU++
收藏数据集概述
数据集列表
- Banking: 包含在线银行查询及其对应的意图标注。
- Span Extraction: 用于SpanConvert论文的数据集。
- NLU++: 用于对话NLU模型的挑战性评估环境,涉及多领域、多标签意图和槽位。
- EVI: 用于基于知识的口语对话系统中的注册、识别和识别的多语言数据集。
Banking 数据集
数据集统计
- 训练样本: 10003
- 测试样本: 3080
- 意图数量: 77
示例查询
| 示例查询 | 意图 |
|---|---|
| Is there a way to know when my card will arrive? | card_arrival |
| I think my card is broken | card_not_working |
| I made a mistake and need to cancel a transaction | cancel_transfer |
| Is my card usable anywhere? | card_acceptance |
引用
当使用Banking数据集时,请引用以下论文: bibtex @inproceedings{Casanueva2020, author = {I{~{n}}igo Casanueva and Tadas Temcinas and Daniela Gerz and Matthew Henderson and Ivan Vulic}, title = {Efficient Intent Detection with Dual Sentence Encoders}, year = {2020}, month = {mar}, note = {Data available at https://github.com/PolyAI-LDN/task-specific-datasets}, url = {https://arxiv.org/abs/2003.04807}, booktitle = {Proceedings of the 2nd Workshop on NLP for ConvAI - ACL 2020} }
Span Extraction 数据集
数据结构
数据集包含以下文件结构:
span_extraction/restaurant8k
test.json train_0.json train_1.json train_2.json ...
其中:
test.json包含评估样本train_0.json包含所有训练样本train_{i}.json包含1/(2^i)的训练数据
示例
json { "userInput": { "text": "I would like a table for one person" }, "labels": [ { "slot": "people", "valueSpan": { "startIndex": 25, "endIndex": 35 } } ] }
引用
当使用Span Extraction数据集时,请引用以下论文: bibtex @inproceedings{CoopeFarghly2020, Author = {Sam Coope and Tyler Farghly and Daniela Gerz and Ivan Vulić and Matthew Henderson}, Title = {Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations}, Year = {2020}, url = {https://arxiv.org/abs/2005.08866}, publisher = {ACL}, }
许可证
本仓库中的数据集遵循LICENSE文件中的许可证。

- 1NLU++: A Multi-Label, Slot-Rich, Generalisable Dataset for Natural Language Understanding in Task-Oriented DialoguePolyAI Limited 伦敦,英国 · 2022年



