Cem Mil Podcasts
收藏arXiv2023-12-13 更新2024-06-21 收录
下载链接:
https://podcastsdataset.byspotify.com/
下载链接
链接失效反馈官方服务:
资源简介:
Cem Mil Podcasts是一个专为多模态、多语言和多方言信息访问研究设计的葡萄牙语播客数据集。该数据集由声破天和西洛人工智能的研究团队创建,包含123,054个播客集,涵盖超过76,000小时的语音音频。数据集的创建过程遵循了与英语播客数据集相同的采样和描述统计方法,确保了数据的质量和代表性。该数据集特别适用于研究语言技术、信息检索和媒体分析等领域,旨在解决播客内容的多语言和多方言处理问题。
Cem Mil Podcasts is a Portuguese podcast dataset specifically designed for research on multimodal, multilingual, and multi-dialect information access. Created by the research teams of Spotify and Silo AI, this dataset contains 123,054 podcast episodes with over 76,000 hours of speech audio. The dataset was constructed following the same sampling and descriptive statistical methods used for English podcast datasets, ensuring data quality and representativeness. It is particularly suitable for research in fields such as language technology, information retrieval, and media analysis, aiming to address the challenges of multilingual and multi-dialect processing of podcast content.
提供机构:
声破天 2 西洛人工智能
创建时间:
2022-09-24



