Toloka Visual Question Answering
收藏arXiv2023-09-28 更新2024-06-21 收录
下载链接:
https://doi.org/10.5281/zenodo.7057740
下载链接
链接失效反馈官方服务:
资源简介:
Toloka Visual Question Answering是一个由JetBrains贝尔格莱德创建的新型众包数据集,旨在评估机器学习系统在视觉问答任务中与人类专家水平的比较。该数据集包含45,199对英文图像和问题,每个图像-问题对都附有正确答案的边界框,分为训练集和两个测试子集。数据集内容丰富,包括从MS COCO数据集中提取的图像,以及与之配对的各种开放式问题。创建过程中,通过众包平台Toloka进行数据收集和标注,确保了数据的高质量和多样性。该数据集主要用于评估和提升多模态问答模型的性能,特别是在理解和处理视觉与文本信息结合的复杂任务中。
Toloka Visual Question Answering is a novel crowdsourced dataset developed by JetBrains Belgrade, aimed at benchmarking machine learning systems against human expert performance on visual question answering (VQA) tasks. It contains 45,199 English image-question pairs, each accompanied by a bounding box marking the correct answer, and is split into a training set and two test subsets. The dataset includes a rich collection of images extracted from the MS COCO dataset, paired with a diverse set of open-ended questions. During its creation, data collection and annotation were conducted via the Toloka crowdsourcing platform, ensuring high data quality and diversity. This dataset is primarily used to evaluate and enhance the performance of multimodal question answering models, particularly in complex tasks that require understanding and integrating both visual and textual information.
提供机构:
JetBrains 贝尔格莱德, 塞尔维亚
创建时间:
2023-09-28



