MEDIA
收藏arXiv2025-09-30 收录
下载链接:
http://www.lrec-conf.org/proceedings/lrec2004/pdf/356.pdf
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含1258个关于酒店预订和信息的转写对话,这些对话被手工标注了语义概念。此外,该数据集因包含76个语义标签而著称,这使得MEDIA任务成为最具挑战性的槽填充任务之一。数据规模方面,包含1258个对话(训练集有13,000个句子,开发集有1,300个句子,测试集有3,500个句子),适用于口语理解(SLU)任务研究。
This dataset contains 1,258 transcribed dialogues concerning hotel booking and information inquiries, which are manually annotated with semantic concepts. Notably, this dataset features 76 semantic tags, which renders the MEDIA task one of the most challenging slot filling tasks. In terms of data scale, it includes a total of 1,258 dialogues, with 13,000 sentences in the training set, 1,300 in the development set, and 3,500 in the test set. This dataset is suitable for research on spoken language understanding (SLU) tasks.
提供机构:
ELRA



