five

Abdullah500/IndicTTS-Bengali

收藏
Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Abdullah500/IndicTTS-Bengali
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: audio dtype: audio - name: text dtype: string - name: source dtype: string splits: - name: train num_bytes: 18331195266 num_examples: 34269 dataset_size: 18331195266 configs: - config_name: default data_files: - split: train path: train-* license: cc-by-4.0 task_categories: - text-to-speech language: - bn --- # Bengali Indic TTS Dataset This dataset is derived from the Indic TTS Database project, specifically using the Bengali monolingual recordings from both male and female speakers. The dataset contains high-quality speech recordings with corresponding text transcriptions, making it suitable for text-to-speech (TTS) research and development. ## Dataset Details - **Language**: Bengali - **Audio Format**: WAV - **Sampling Rate**: 48000Hz - **Speakers**: 4 (2 male, 2 female native Bengali speakers) - **Content Type**: Monolingual Bengali utterances - **Recording Quality**: Studio-quality recordings - **Transcription**: Available for all audio files - **Sources**: `bengali_fem_new` `bengali_female_old` `bengali_male_new` `bengali_male_old` `english_female_old` `english_male_old` ## Dataset Source This dataset is derived from the Indic TTS Database, a special corpus of Indian languages developed by the Speech Technology Consortium at IIT Madras. The original database covers 13 major languages of India and contains 10,000+ spoken sentences/utterances for both monolingual and English recordings. ## License & Usage This dataset is subject to the original Indic TTS license terms. Before using this dataset, please ensure you have read and agreed to the [License For Use of Indic TTS](https://www.iitm.ac.in/donlab/indictts/downloads/license.pdf). ## Acknowledgments This dataset would not be possible without the work of the Speech Technology Consortium at IIT Madras. Special acknowledgment goes to: - Speech Technology Consortium - Department of Computer Science & Engineering and Electrical Engineering, IIT Madras - Bhashini, MeitY - Prof. Hema A Murthy & Prof. S Umesh ## Citation If you use this dataset in your research or applications, please cite the original Indic TTS project: ```bibtex @misc{indictts2023, title = {Indic {TTS}: A Text-to-Speech Database for Indian Languages}, author = {Speech Technology Consortium and {Hema A Murthy} and {S Umesh}}, year = {2023}, publisher = {Indian Institute of Technology Madras}, url = {https://www.iitm.ac.in/donlab/indictts/}, institution = {Department of Computer Science and Engineering and Electrical Engineering, IIT MADRAS} } ``` ## Contact For any issues or queries related to this HuggingFace dataset version, feel free to comment in the Community tab. For queries related to the original Indic TTS database, please contact: smtiitm@gmail.com ## Original Database Access The original complete database can be accessed at: https://www.iitm.ac.in/donlab/indictts/database Note: The original database provides access to data in multiple Indian languages and variants. This HuggingFace dataset specifically contains the Bengali monolingual portion of that database.
提供机构:
Abdullah500
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作