five

Fused dataset

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14793206
下载链接
链接失效反馈
官方服务:
资源简介:
The current version of the TOD (Task Oriented Dialogues) fused dataset contains samples from MultiWOZ2.2 (Zang et al., 2020), SpokenWOZ (Si et al., 2023), FRAMES (Asri et al., 2017), DSTC3 (Henderson et al., 2014a) and SGD (Rastogi et al., 2020) datasets. These datasets have been selected due to them all being high quality, with significant human validation and data cleaning. Additionally, this selection of datasets provides coverage across unique attributes, such as utterance-level audio files (Si et al., 2023). The fused dataset requires several domains, necessitated by the scope of ELOQUENCE project (https://eloquenceai.eu) and the individual pilots. These datasets are stored using the ‘.arrow’ file extension so that speed and efficiency of data loading is optimised, as well as being compliant with the popular HuggingFace dataset library (HuggingFace, 2024). The dataset is also available at https://huggingface.co/datasets/Brunel-AI/ELOQUENCE. Currently, several datasets have been implemented within this fused dataset. However, due to the flexibility with which the schema has been defined, there is scope for additional datasets to be implemented across later iterations as further needs are identified. The JSON schema, as well as further explanation for attributes across all domains, is provided within Appendix 10.1 in ELOQUENCE deliverable 1.1.
创建时间:
2025-02-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作