ontocord/MixtureVitae-VALID

Name: ontocord/MixtureVitae-VALID
Creator: ontocord
Published: 2025-04-26 12:54:09
License: 暂无描述

Hugging Face2025-04-26 更新2025-07-05 收录

下载链接：

https://hf-mirror.com/datasets/ontocord/MixtureVitae-VALID

下载链接

链接失效反馈

官方服务：

资源简介：

VALID (视频-音频大交错数据集) 是一个包含约720,000个来自YouTube的Creative Commons许可视频的多模态数据集，经过处理形成音频-视频-文本数据记录，适用于机器学习研究。数据集以特定的格式组织，目的是训练模型进行多模态理解，如对比多模态学习（例如CLIP、CLAP）。

The VALID (Video-Audio Large Interleaved Dataset) is a multimodal dataset comprising approximately 720,000 Creative Commons licensed videos from YouTube, processed into audio-video-text data records for machine learning research. The dataset is organized in a specific format aimed at training models for multimodal understanding, such as contrastive multimodal learning (e.g., CLIP, CLAP).

提供机构：

ontocord

5,000+

优质数据集

54 个

任务类型

进入经典数据集