A Dataset for Sentence Retrieval for Open-Ended Dialogues

Name: A Dataset for Sentence Retrieval for Open-Ended Dialogues
Creator: 以色列理工学院
Published: 2022-05-24 08:51:39
License: 暂无描述

arXiv2022-05-24 更新2024-06-21 收录

下载链接：

https://github.com/SIGIR-2022/A-Datasetfor-Sentence-Retrieval-for-Open-Ended-Dialogues.git

下载链接

链接失效反馈

官方服务：

资源简介：

本数据集名为‘开放式对话句子检索数据集’，由以色列理工学院的研究团队创建。数据集包含846个从Reddit提取的开放式对话，每个对话均配有50个从Wikipedia检索的候选句子，并经过人工标注以判断其是否包含生成对话下一轮所需的有用信息。数据集的创建过程涉及对话的收集、候选句子的检索及人工标注。该数据集主要应用于对话系统领域，旨在解决开放式对话中信息检索的挑战，提高对话系统的自然性和交互性。

This dataset is named the Open-Domain Dialogue Sentence Retrieval Dataset, which was developed by a research team from the Technion – Israel Institute of Technology. It includes 846 open-domain dialogues extracted from Reddit, with each dialogue paired with 50 candidate sentences retrieved from Wikipedia. Manual annotation was conducted to judge whether these candidate sentences contained useful information for generating the next turn of the corresponding dialogue. The dataset construction process involves dialogue collection, candidate sentence retrieval and manual annotation. This dataset is primarily applied in the field of dialogue systems, aiming to address the challenges of information retrieval in open-domain dialogues and enhance the naturalness and interactivity of dialogue systems.

提供机构：

以色列理工学院

创建时间：

2022-05-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集