five

Multilingual VIDEO Dataset with Speaker Stems, Diarized Transcripts, Scene Descriptions | 2.8M+ ...

收藏
Databricks2025-11-22 收录
下载链接:
https://marketplace.databricks.com/details/202d9a1f-11c5-4e63-a0b5-da50ffc1687e/ACNetwork_Multilingual-VIDEO-Dataset-with-Speaker-Stems,-Diarized-Transcripts,-Scene-Descriptions-2.8M+-
下载链接
链接失效反馈
官方服务:
资源简介:
The ACNetwork Multilingual Conversational Media Dataset delivers more than 2.8M hours of studio and UCG audio and video data across 133 languages: including English, Portuguese, Spanish, Russian, French, German, Italian, Japanese, Korean and more. Each file features high-fidelity audio with isolated stems, transcripts at word and utterance level, speaker diarization, emotion and tone labels, scene descriptions, related metadata, and a human-verified subset for benchmarking. All content is fully rights-cleared and indemnified for AI/LLM training and use cases, making it ideal for speech-to-text, multimodal retrieval, video generation, cross-lingual fine-tuning, and emotion classification. The dataset spans sports, news, entertainment, gaming, true crime, and more, with new content added monthly to ensure domain diversity and freshness. Delivery is via Google Drive or AWS S3, and data is provided in multiple formats (JSON, SRT, VTT, TXT). This is the definitive global multilingual media dataset for enterprise-scale AI and LLM training - combining audio and video sources at unmatched depth and breadth. Direct licensing by inquiry only - contact ACNetwork for commercial terms or expanded sample access.
提供机构:
ACNetwork
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作