five

RoDia

收藏
arXiv2024-03-21 更新2024-06-21 收录
下载链接:
https://github.com/codrut2/RoDia
下载链接
链接失效反馈
官方服务:
资源简介:
RoDia是首个针对罗马尼亚方言识别的语音数据集,由布加勒斯特大学和国立科学与技术大学布加勒斯特理工大学创建。该数据集包含来自罗马尼亚五个不同地区的语音样本,涵盖城市和农村环境,总时长2小时,经过人工标注。数据集包含2768个音频样本,每个样本时长2.5至5.0秒,采样率为44.1 kHz。创建过程中,研究团队从当地电视台收集访谈和节目,并通过严格的筛选和本地注释者的验证确保数据质量。RoDia数据集旨在解决罗马尼亚方言识别的挑战,特别是在低资源语言环境中,为方言识别研究提供了一个宝贵的资源。

RoDia is the first speech dataset for Romanian dialect recognition, created by the University of Bucharest and Politehnica University of Bucharest (National University of Science and Technology). The dataset contains speech samples from five distinct regions of Romania, covering both urban and rural environments, with a total duration of 2 hours and has undergone manual annotation. It consists of 2768 audio samples, each with a duration ranging from 2.5 to 5.0 seconds and a sampling rate of 44.1 kHz. During the dataset creation process, the research team collected interviews and programs from local television stations, and ensured data quality through strict screening and verification by local annotators. RoDia aims to address the challenges of Romanian dialect recognition, especially in low-resource language settings, and provides a valuable resource for dialect recognition research.
提供机构:
布加勒斯特大学, 罗马尼亚 2国立科学与技术大学布加勒斯特理工大学, 罗马尼亚
创建时间:
2023-09-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作