eduhk-compling/s11609150_Kaur_Sukhkamalpreet
收藏Hugging Face2026-02-05 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/eduhk-compling/s11609150_Kaur_Sukhkamalpreet
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含58个印地语句子,使用智能手机在自然环境中录制。每个音频文件都配有两个文本版本:标准印地语转录和罗马化拼音版本。数据准备过程中解决了两个关键问题:首先,使用频谱滤波减少典型的移动录音噪音,以提高清晰度而不改变语音质量;其次,通过提供拼音和正字法转录,解决了口语和书面形式之间的常见差异,例如jari raha与标准ja raha。这种双重方法既保持了真实的语音数据,又提供了标准化文本,使该数据集适用于语音技术开发和语言学研究。
This dataset consists of 58 Hindi sentences recorded in a natural setting using a smartphone. Each audio file is paired with two text versions: a standard Hindi transcript and a Romanized phonetic version. Preparation involved resolving two key issues. First, typical mobile recording noise was reduced using spectral filtering to improve clarity without altering speech quality. Second, common variations between spoken and written forms, such as jari raha versus the standard ja raha, were addressed by providing both phonetic and orthographic transcriptions. This dual approach maintains authentic speech data while supplying standardized text, making the collection suitable for speech technology development and linguistic research.
提供机构:
eduhk-compling



