five

Pratik/Gujarati_OpenSLR

收藏
Hugging Face2021-11-17 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Pratik/Gujarati_OpenSLR
下载链接
链接失效反馈
官方服务:
资源简介:
OpenSLR is a site devoted to hosting speech and language resources, such as training corpora for speech recognition, and software related to speech recognition. They intend to be a convenient place for anyone to put resources that they have created, so that they can be downloaded publicly. They aim to provide a central, hassle-free place for others to put their speech resources. see there http://www.openslr.org/contributions.html #Supported Task Automatic Speech Recognition #Languages Gujarati Identifier: SLR78 Summary: Data set which contains recordings of native speakers of Gujarati. Category: Speech License: Attribution-ShareAlike 4.0 International Downloads (use a mirror closer to you): about.html [1.5K] (Information about the data set ) Mirrors: [China] LICENSE [20K] (License information for the data set ) Mirrors: [China] line_index_female.tsv [423K] (Lines recorded by the female speakers ) Mirrors: [China] line_index_male.tsv [393K] (Lines recorded by the male speakers ) Mirrors: [China] gu_in_female.zip [917M] (Archive containing recordings from female speakers ) Mirrors: [China] gu_in_male.zip [825M] (Archive file recordings from male speakers ) Mirrors: [China] About this resource: This data set contains transcribed high-quality audio of Gujarati sentences recorded by volunteers. The data set consists of wave files, and a TSV file (line_index.tsv). The file line_index.tsv contains a anonymized FileID and the transcription of audio in the file. The data set has been manually quality checked, but there might still be errors. Please report any issues in the following issue tracker on GitHub. https://github.com/googlei18n/language-resources/issues See LICENSE file for license information. Copyright 2018, 2019 Google, Inc. If you use this data in publications, please cite it as follows: @inproceedings{he-etal-2020-open, title = {{Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems}}, author = {He, Fei and Chu, Shan-Hui Cathy and Kjartansson, Oddur and Rivera, Clara and Katanova, Anna and Gutkin, Alexander and Demirsahin, Isin and Johny, Cibu and Jansche, Martin and Sarin, Supheakmungkol and Pipatsrisawat, Knot}, booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)}, month = may, year = {2020}, address = {Marseille, France}, publisher = {European Language Resources Association (ELRA)}, pages = {6494--6503}, url = {https://www.aclweb.org/anthology/2020.lrec-1.800}, ISBN = "{979-10-95546-34-4}, }
提供机构:
Pratik
原始信息汇总

数据集概述

基本信息

  • 数据集标识符:SLR78
  • 类别:Speech
  • 支持任务:Automatic Speech Recognition
  • 语言:Gujarati

内容描述

  • 数据集内容:包含Gujarati语的录音,由志愿者录制,包含高质量的音频文件和TSV格式文件(line_index.tsv)。
  • 文件详情
    • line_index_female.tsv:女性发言者的录音索引,大小423K。
    • line_index_male.tsv:男性发言者的录音索引,大小393K。
    • gu_in_female.zip:女性发言者的录音档案,大小917M。
    • gu_in_male.zip:男性发言者的录音档案,大小825M。

版权与许可

  • 许可证:Attribution-ShareAlike 4.0 International
  • 版权声明:Copyright 2018, 2019 Google, Inc.

引用信息

  • 引用格式

    @inproceedings{he-etal-2020-open, title = {Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems}, author = {He, Fei and Chu, Shan-Hui Cathy and Kjartansson, Oddur and Rivera, Clara and Katanova, Anna and Gutkin, Alexander and Demirsahin, Isin and Johny, Cibu and Jansche, Martin and Sarin, Supheakmungkol and Pipatsrisawat, Knot}, booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)}, month = may, year = {2020}, address = {Marseille, France}, publisher = {European Language Resources Association (ELRA)}, pages = {6494--6503}, url = {https://www.aclweb.org/anthology/2020.lrec-1.800}, ISBN = "{979-10-95546-34-4}, }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作