DialogCC
收藏arXiv2024-03-29 更新2024-06-21 收录
下载链接:
https://dialogcc.github.io/
下载链接
链接失效反馈官方服务:
资源简介:
DialogCC是一个由韩国科学技术院开发的高质量多模态对话数据集,旨在训练能够处理开放领域对话的通用多模态对话模型。该数据集通过一个自动化的管道创建,确保了对话质量和图像多样性,无需人工干预。DialogCC包含多种图像每对话和每语句,分别平均有7.34和4.77张图像,这有助于提高模型的泛化性能。该数据集的应用领域包括提高多模态对话模型在未见对话数据集上的泛化性能。
DialogCC is a high-quality multimodal dialogue dataset developed by the Korea Advanced Institute of Science and Technology (KAIST), designed to train general-purpose multimodal dialogue models capable of handling open-domain conversations. This dataset is created via an automated pipeline, which ensures dialogue quality and image diversity without manual intervention. DialogCC contains multiple images per dialogue and per utterance, with average counts of 7.34 and 4.77 respectively, a feature that helps improve the generalization performance of models. The application scenarios of this dataset include enhancing the generalization performance of multimodal dialogue models on unseen dialogue datasets.
提供机构:
韩国科学技术院
创建时间:
2022-12-08



