OmniSource

Name: OmniSource
Creator: OpenDataLab
Published: 2026-05-17 09:30:00
License: 暂无描述

OpenDataLab2026-05-17 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/OmniSource

下载链接

链接失效反馈

官方服务：

资源简介：

我们介绍了 OmniSource，这是一种利用 Web 数据训练视频识别模型的新颖框架。 OmniSource 克服了数据格式之间的障碍，例如用于网络监督学习的图像、短视频和未修剪的长视频。首先，具有多种格式的数据样本，由特定任务的数据收集管理并由教师模型自动过滤，转换为统一的形式。然后提出了一种联合训练策略来处理网络监督学习中多个数据源和格式之间的领域差距。在联合训练中采用了几种良好的做法，包括数据平衡、重采样和跨数据集混合。实验表明，通过利用来自多个来源和格式的数据，OmniSource 在训练中的数据效率更高。由于只有 350 万张图片和 80 万分钟的视频从互联网上抓取而没有人工标记（不到之前工作的 2%），我们使用 OmniSource 学习的模型将 2D 和 3D-ConvNet 基线模型的 Top-1 准确度提高了 3.0% 和 3.9 %，分别在 Kinetics-400 基准上。借助 OmniSource，我们使用不同的视频识别预训练策略建立了新记录。我们最好的模型在 Kinetics-400 基准上分别实现了 80.4%、80.5% 和 83.6 的 Top-1 准确度，用于从头开始训练、ImageNet 预训练和 IG-65M 预训练。

We introduce OmniSource, a novel framework for training video recognition models using web data. OmniSource breaks through the barriers between different data modalities, including images, short videos, and untrimmed long videos used for web-supervised learning. First, data samples of various formats, which are managed by task-specific data collection pipelines and automatically filtered by teacher models, are converted into a unified representation. Then, a joint training strategy is proposed to address the domain gaps across multiple data sources and modalities in web-supervised learning. Several best practices are adopted in the joint training, including data balancing, resampling, and cross-dataset mixing. Experiments demonstrate that by leveraging data from multiple sources and modalities, OmniSource achieves higher data efficiency during training. With only 3.5 million images and 800,000 minutes of videos crawled from the Internet without manual annotation (less than 2% of previous work), the models trained with OmniSource improve the Top-1 accuracy of 2D and 3D-ConvNet baseline models by 3.0% and 3.9% respectively on the Kinetics-400 benchmark. With OmniSource, we establish new state-of-the-art results using different video recognition pre-training strategies. Our best model achieves Top-1 accuracies of 80.4%, 80.5%, and 83.6% respectively on the Kinetics-400 benchmark for training from scratch, ImageNet pre-training, and IG-65M pre-training.

提供机构：

OpenDataLab

创建时间：

2022-03-17

搜集汇总

数据集介绍