Fused dataset

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14793206

下载链接

链接失效反馈

官方服务：

资源简介：

The current version of the TOD (Task Oriented Dialogues) fused dataset contains samples from MultiWOZ2.2 (Zang et al., 2020), SpokenWOZ (Si et al., 2023), FRAMES (Asri et al., 2017), DSTC3 (Henderson et al., 2014a) and SGD (Rastogi et al., 2020) datasets. These datasets have been selected due to them all being high quality, with significant human validation and data cleaning. Additionally, this selection of datasets provides coverage across unique attributes, such as utterance-level audio files (Si et al., 2023). The fused dataset requires several domains, necessitated by the scope of ELOQUENCE project (https://eloquenceai.eu) and the individual pilots. These datasets are stored using the ‘.arrow’ file extension so that speed and efficiency of data loading is optimised, as well as being compliant with the popular HuggingFace dataset library (HuggingFace, 2024). The dataset is also available at https://huggingface.co/datasets/Brunel-AI/ELOQUENCE. Currently, several datasets have been implemented within this fused dataset. However, due to the flexibility with which the schema has been defined, there is scope for additional datasets to be implemented across later iterations as further needs are identified. The JSON schema, as well as further explanation for attributes across all domains, is provided within Appendix 10.1 in ELOQUENCE deliverable 1.1.

创建时间：

2025-02-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集