Pratik/Gujarati_OpenSLR

Name: Pratik/Gujarati_OpenSLR
Creator: Pratik
Published: 2021-11-17 13:36:56
License: 暂无描述

Hugging Face2021-11-17 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Pratik/Gujarati_OpenSLR

下载链接

链接失效反馈

官方服务：

资源简介：

OpenSLR is a site devoted to hosting speech and language resources, such as training corpora for speech recognition, and software related to speech recognition. They intend to be a convenient place for anyone to put resources that they have created, so that they can be downloaded publicly. They aim to provide a central, hassle-free place for others to put their speech resources. see there http://www.openslr.org/contributions.html #Supported Task Automatic Speech Recognition #Languages Gujarati Identifier: SLR78 Summary: Data set which contains recordings of native speakers of Gujarati. Category: Speech License: Attribution-ShareAlike 4.0 International Downloads (use a mirror closer to you): about.html [1.5K] (Information about the data set ) Mirrors: [China] LICENSE [20K] (License information for the data set ) Mirrors: [China] line_index_female.tsv [423K] (Lines recorded by the female speakers ) Mirrors: [China] line_index_male.tsv [393K] (Lines recorded by the male speakers ) Mirrors: [China] gu_in_female.zip [917M] (Archive containing recordings from female speakers ) Mirrors: [China] gu_in_male.zip [825M] (Archive file recordings from male speakers ) Mirrors: [China] About this resource: This data set contains transcribed high-quality audio of Gujarati sentences recorded by volunteers. The data set consists of wave files, and a TSV file (line_index.tsv). The file line_index.tsv contains a anonymized FileID and the transcription of audio in the file. The data set has been manually quality checked, but there might still be errors. Please report any issues in the following issue tracker on GitHub. https://github.com/googlei18n/language-resources/issues See LICENSE file for license information. Copyright 2018, 2019 Google, Inc. If you use this data in publications, please cite it as follows: @inproceedings{he-etal-2020-open, title = {{Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems}}, author = {He, Fei and Chu, Shan-Hui Cathy and Kjartansson, Oddur and Rivera, Clara and Katanova, Anna and Gutkin, Alexander and Demirsahin, Isin and Johny, Cibu and Jansche, Martin and Sarin, Supheakmungkol and Pipatsrisawat, Knot}, booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)}, month = may, year = {2020}, address = {Marseille, France}, publisher = {European Language Resources Association (ELRA)}, pages = {6494--6503}, url = {https://www.aclweb.org/anthology/2020.lrec-1.800}, ISBN = "{979-10-95546-34-4}, }

提供机构：

Pratik

原始信息汇总

数据集概述

基本信息

数据集标识符：SLR78
类别：Speech
支持任务：Automatic Speech Recognition
语言：Gujarati

内容描述

数据集内容：包含Gujarati语的录音，由志愿者录制，包含高质量的音频文件和TSV格式文件（line_index.tsv）。
文件详情：
- line_index_female.tsv：女性发言者的录音索引，大小423K。
- line_index_male.tsv：男性发言者的录音索引，大小393K。
- gu_in_female.zip：女性发言者的录音档案，大小917M。
- gu_in_male.zip：男性发言者的录音档案，大小825M。

版权与许可

许可证：Attribution-ShareAlike 4.0 International

引用信息

引用格式：

@inproceedings{he-etal-2020-open, title = {Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems}, author = {He, Fei and Chu, Shan-Hui Cathy and Kjartansson, Oddur and Rivera, Clara and Katanova, Anna and Gutkin, Alexander and Demirsahin, Isin and Johny, Cibu and Jansche, Martin and Sarin, Supheakmungkol and Pipatsrisawat, Knot}, booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)}, month = may, year = {2020}, address = {Marseille, France}, publisher = {European Language Resources Association (ELRA)}, pages = {6494--6503}, url = {https://www.aclweb.org/anthology/2020.lrec-1.800}, ISBN = "{979-10-95546-34-4}, }

5,000+

优质数据集

54 个

任务类型

进入经典数据集