DIALSTORY
收藏arXiv2022-12-12 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2209.08524v2
下载链接
链接失效反馈官方服务:
资源简介:
DIALSTORY数据集由清华大学计算机科学与技术系和北京国家信息科学与技术研究中心的CoAI小组创建,包含105,000个中文故事,每个故事至少包含10个对话轮次,总对话轮次占故事总长度的30%至50%。该数据集用于支持Masked Dialogue Generation和Dialogue Speaker Recognition两个新任务的评估,旨在通过机器理解故事中的角色特征和角色间关系,生成或识别故事中的对话。数据集的构建过程包括随机抽样、自动标注对话轮次和角色识别,以及手动验证标注的准确性。DIALSTORY数据集的应用领域主要集中在提升机器在故事理解、对话生成和角色识别方面的能力,以推动人工智能在文学创作和交互式游戏中的应用。
The DIALSTORY dataset was developed by the CoAI Group from the Department of Computer Science and Technology, Tsinghua University, and the Beijing National Research Center for Information Science and Technology. It contains 105,000 Chinese stories, each of which includes at least 10 dialogue turns, with the total number of dialogue turns in each story accounting for 30% to 50% of the story's total length. This dataset is designed to support the evaluation of two novel tasks: Masked Dialogue Generation and Dialogue Speaker Recognition. Its core objective is to enable machines to comprehend the character features and inter-character relationships within stories, thereby facilitating the generation or recognition of dialogues in stories. The construction workflow of the DIALSTORY dataset encompasses random sampling, automatic annotation of dialogue turns and character recognition, as well as manual validation of annotation accuracy. The application scope of the DIALSTORY dataset primarily focuses on enhancing machine capabilities in story understanding, dialogue generation, and character recognition, with the goal of advancing the application of artificial intelligence in literary creation and interactive gaming.
提供机构:
清华大学
创建时间:
2022-09-18



