DarshanaS/IndicAccentDb

Name: DarshanaS/IndicAccentDb
Creator: DarshanaS
Published: 2023-04-30 09:53:41
License: 暂无描述

Hugging Face2023-04-30 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/DarshanaS/IndicAccentDb

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: c-uda --- ## 1. Introduction Introducing a novel accent database "IndicAccentDB" which satisfies the below requirements: * **Gender balance:** The speech database should be a collection of a wide range of speakers balancing both the male and female speakers to display the characteristics of the speakers speech. * **Phonetically balanced uniform content:** To make the classification task simpler and models to distinguish the speakers, we considered building the IndicAccentDB with uniform content, a collection of speech recordings for the Harvard sentences. These sentences gather intrinsic information by combining different phonemes and grammatically focused vocabulary. These sentences are appropriately expressing accents in sentence-level discourse. You can access the Harvard sentences (sample shown below) dataset here: [Harvard Sentences](https://www.cs.columbia.edu/~hgs/audio/harvard.html) recited by the speakers in the recordings. *The juice of lemons makes fine punch.* *The fish twisted and turned on the bent hook.* * IndicAccentDB contains speech recordings in six non-native English accents of Gujarati, Hindi, Kannada, Malayalam, Tamil, and Telugu. We collected six non-native accents from volunteers who had strong non-native English accents and were well-versed in speaking at least one Indian language. Each speaker was asked to recite the Harvard sentences. The Harvard sentences dataset consists of 72 sets of ten sentences each and is phonetically balanced sentences that are neither too short nor too long. ## 2. Dataset Usage To use the dataset in your Python program, refer to the following script: ```python3 from datasets import load_dataset accent_db = load_dataset("DarshanaS/IndicAccentDb") ``` ## 3. Publications 1. [S. Darshana, H. Theivaprakasham, G. Jyothish Lal, B. Premjith, V. Sowmya and K. Soman, "MARS: A Hybrid Deep CNN-based Multi-Accent Recognition System for English Language," 2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR), Hyderabad, India, 2022, pp. 1-6, doi: 10.1109/ICAITPR51569.2022.9844177.](https://ieeexplore.ieee.org/document/9844177)

提供机构：

DarshanaS

原始信息汇总

数据集概述

1. 简介

IndicAccentDB 是一个新颖的口音数据库，满足以下要求：

性别平衡： 该数据库包含广泛的说话者，平衡了男性和女性说话者，以展示说话者的语音特征。
语音平衡的统一内容： 为了简化分类任务并使模型能够区分说话者，我们构建了 IndicAccentDB，包含统一的语音录音内容，即哈佛句子的录音。这些句子通过结合不同的音素和语法重点词汇，收集了内在信息。这些句子适当地表达了句子级别的口音。
- 示例句子：
  - The juice of lemons makes fine punch.
  - The fish twisted and turned on the bent hook.
IndicAccentDB 包含六种非母语英语口音的语音录音，分别是古吉拉特语、印地语、卡纳达语、马拉雅拉姆语、泰米尔语和泰卢固语。我们从志愿者中收集了这六种非母语口音，这些志愿者具有强烈的非母语英语口音，并且精通至少一种印度语言。每位说话者都被要求朗读哈佛句子。哈佛句子数据集包含 72 组，每组十个句子，这些句子在语音上是平衡的，既不太短也不太长。

2. 数据集使用

要在 Python 程序中使用该数据集，请参考以下脚本： python from datasets import load_dataset accent_db = load_dataset("DarshanaS/IndicAccentDb")

5,000+

优质数据集

54 个

任务类型

进入经典数据集