thu-spmi/librispeech-phoneme-labels
收藏Hugging Face2025-12-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/thu-spmi/librispeech-phoneme-labels
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- automatic-speech-recognition
- translation
language:
- en
tags:
- lecture
pretty_name: d
size_categories:
- 10M<n<100M
---
# LibriSpeech IPA Phoneme Labels
This repository provides **IPA-based phoneme annotations and lexicon** for the LibriSpeech dataset.
All phoneme labels are converted from **CMU Pronouncing Dictionary (CMU-Dict)** phonemes into **IPA symbols** using deterministic rules, with the help of the following toolkit:
- https://pypi.org/project/pinyin-to-ipa
The data is intended for **phoneme-based ASR**, **P2G/G2P research**, **phoneme CTC / AED models**, and **cross-lingual phoneme experiments**.
## Dataset Structure
- `train-clean-100-phoneme`
- `train-clean-360-phoneme`
- `train-other-500-phoneme`
- `dev-clean-phoneme`
- `dev-other-phoneme`
- `test-clean-phoneme`
- `test-other-phoneme`
- `lexicon.txt`
- `phone_list`
许可证:Apache-2.0
任务类别:
- 自动语音识别(Automatic Speech Recognition, ASR)
- 机器翻译
语言:
- 英语
标签:
- 讲座
友好名称:d
数据规模分类:
- 10M<n<100M
# LibriSpeech 国际音标(International Phonetic Alphabet, IPA)音素标注集
本数据集仓库为LibriSpeech数据集提供**基于国际音标(IPA)的音素标注与发音词典**。
所有音素标注均通过确定性转换规则,借助下述工具包,将**CMU发音词典(CMU Pronouncing Dictionary, CMU-Dict)**的音素转换为国际音标符号:
- https://pypi.org/project/pinyin-to-ipa
本数据集可应用于面向音素的自动语音识别(ASR)、音素-字形转换(P2G)与字形-音素转换(G2P)研究、音素级连接时序分类(CTC)与注意力端到端(AED)模型研发,以及跨语言音素相关实验。
## 数据集结构
- `train-clean-100-phoneme`
- `train-clean-360-phoneme`
- `train-other-500-phoneme`
- `dev-clean-phoneme`
- `dev-other-phoneme`
- `test-clean-phoneme`
- `test-other-phoneme`
- `lexicon.txt`
- `phone_list`
提供机构:
thu-spmi



