five

Statcan Dialogue Dataset

收藏
DataONE2023-04-06 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:efeb733db49b580efaba0b9ac5db885107c13a3d48ad4ec2f15be7ec3f210b02
下载链接
链接失效反馈
官方服务:
资源简介:
Welcome to the data repository for requesting access to the Statcan Dialogue Dataset! Before requesting access, you can visit our website or read our EACL 2023 paper Requesting Access In order to use our dataset, you must agree to the terms of use and restrictions before requesting access (see below). We will manually review each request and grant access or reach out to you for further information. To facilitate the process, make sure that: Your Dataverse account is linked to your professional/research website, which we may review to ensure the dataset will be used for the intended purpose Your request is made with an academic (e.g. .edu) or professional email (e.g. @servicenow.com). To do this, your have to set your primary email to your academic/professional email, or create a new Dataverse account. If your academic institution does not end with .edu, or you are part of a professional group that does not have an email address, please contact us (see email in paper). Abstract: We introduce the StatCan Dialogue Dataset consisting of 19,379 conversation turns between agents working at Statistics Canada and online users looking for published data tables. The conversations stem from genuine intents, are held in English or French, and lead to agents retrieving one of over 5000 complex data tables. Based on this dataset, we propose two tasks: (1) automatic retrieval of relevant tables based on a on-going conversation, and (2) automatic generation of appropriate agent responses at each turn. We investigate the difficulty of each task by establishing strong baselines. Our experiments on a temporal data split reveal that all models struggle to generalize to future conversations, as we observe a significant drop in performance across both tasks when we move from the validation to the test set. In addition, we find that response generation models struggle to decide when to return a table. Considering that the tasks pose significant challenges to existing models, we encourage the community to develop models for our task, which can be directly used to help knowledge workers find relevant tables for live chat users.
创建时间:
2023-12-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作