Introducing the COVID-19 YouTube (COVYT) speech dataset featuring the same speakers with and without infection

Name: Introducing the COVID-19 YouTube (COVYT) speech dataset featuring the same speakers with and without infection
Creator: Zenodo
Published: 2022-09-08 10:25:11
License: 暂无描述

Zenodo2022-09-08 更新2026-06-04 收录

下载链接：

https://zenodo.org/record/6962929

下载链接

链接失效反馈

官方服务：

资源简介：

The COVYT dataset contains speech samples from individuals who self-reported their COVID-19 infection on public social media platforms (YouTube, Xiaohongshu). These videos, as well as accompanying videos of the same people prior to infection, were mined in an attempt to gather publicly-available data for COVID-19 research. This release includes the links to the original videos along with the accompanying manual segmentation and diarisation that identifies the utterances of the target individuals. We are additionally releasing features derived from the segmented utterances. Finally, the dataset includes partitioning information according to 4 different cross-validation schemes. See the arxiv pre-print for more details: https://arxiv.org/abs/2206.11045

COVYT数据集（COVYT dataset）收录了在公开社交平台（YouTube、小红书）上自述感染新冠病毒的个体的语音样本。为收集用于新冠研究的公开可用数据，我们采集了这些个体感染前后的相关视频。本次发布包含原始视频链接，以及可识别目标个体语音片段的人工分段（manual segmentation）与说话人 diarisation（diarisation）。此外，我们还同步发布了从分段语音片段中提取的特征。最后，本数据集包含基于4种不同交叉验证方案的划分信息。更多细节可参阅arXiv预印本：https://arxiv.org/abs/2206.11045

提供机构：

Zenodo

创建时间：

2022-08-31