Spoken ObjectNet

Name: Spoken ObjectNet
Creator: 麻省理工学院计算机科学与人工智能实验室
Published: 2021-10-15 01:38:20
License: 暂无描述

arXiv2021-10-15 更新2024-06-21 收录

下载链接：

http://groups.csail.mit.edu/sls/downloads/son/index.cgi

下载链接

链接失效反馈

官方服务：

资源简介：

Spoken ObjectNet是由麻省理工学院计算机科学与人工智能实验室创建的一个大型口语描述数据集，旨在减少现有音视频数据集的偏见，并提高模型在真实世界场景中的性能。该数据集基于ObjectNet图像数据集，通过改进的数据收集流程，包括自动语言模型检查，提高了描述质量。Spoken ObjectNet包含50,273条口语描述，用于评估音视频模型在偏见控制环境下的表现，特别是在图像检索和音频检索任务中。数据集的应用领域主要集中在提高模型对真实世界复杂性的适应能力，解决因数据集偏见导致的模型性能下降问题。

Spoken ObjectNet is a large-scale spoken description dataset developed by the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). It aims to reduce biases in existing audio-visual datasets and improve the performance of models in real-world scenarios. Built upon the ObjectNet image dataset, Spoken ObjectNet enhances the quality of descriptions through an optimized data collection workflow that incorporates automated language model checks. The dataset contains 50,273 spoken descriptions, which are used to evaluate the performance of audio-visual models in bias-controlled environments, particularly in image retrieval and audio retrieval tasks. The primary applications of this dataset focus on enhancing models' adaptability to real-world complexities and mitigating model performance degradation caused by dataset biases.

提供机构：

麻省理工学院计算机科学与人工智能实验室

创建时间：

2021-10-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集