changelinglab/librispeech-segment

Name: changelinglab/librispeech-segment
Creator: changelinglab
Published: 2026-04-12 17:09:10
License: 暂无描述

Hugging Face2026-04-12 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/changelinglab/librispeech-segment

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 language: - en pretty_name: LibriSpeech Segment task_categories: - automatic-speech-recognition tags: - speech - phone-alignment - segmentation - english size_categories: - 100K<n<1M --- # LibriSpeech Segment English read-speech corpus with **phone-level time alignments** (Montreal Forced Aligner). Suitable for training and evaluating phone recognition and phonetic segmentation models. ## Sources - **Audio**: [LibriSpeech](https://www.openslr.org/12/) (OpenSLR 12) by Vassil Panayotov, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur (2015). - **Phone alignments**: [`anyspeech/librispeech_MFA_alignments`](https://huggingface.co/datasets/anyspeech/librispeech_MFA_alignments). ## Splits | Split | Utterances | |-------------------|------------| | train.clean.100 | 28,538 | | train.clean.360 | 104,008 | | train.other.500 | 148,645 | | dev.clean | 2,703 | | dev.other | 2,864 | | test.clean | 2,620 | | test.other | 2,938 | Split labels follow the LibriSpeech canonical naming. ## Schema | Column | Type | Description | |----------------|----------------------|------------------------------------------------------| | `utt_id` | string | Utterance id, e.g. `7635-105409-0022` | | `audio` | Audio(16 kHz) | Embedded waveform bytes (decoded on access) | | `text` | string | Word-level transcript (uppercase) | | `phones` | sequence[string] | ARPABET phone tokens | | `phone_starts` | sequence[float64] | Phone start times in seconds | | `phone_ends` | sequence[float64] | Phone end times in seconds | | `language` | string | `eng` (ISO 639-3) | | `speaker_id` | string | LibriSpeech speaker id | | `duration` | float64 | Utterance duration in seconds | | `split` | string | LibriSpeech split label | ## Phone inventory Phones are ARPABET (e.g. `DH`, `EH`, `R`, `AE`, `OW`). Silence and pauses are marked with `[SIL]` intervals, kept in the alignment so boundary models can learn from them. `[UNK]` may also appear for OOV cases. ## License Released under the **CC BY 4.0** license, matching the original LibriSpeech audio. ## Citation ```bibtex @inproceedings{panayotov2015librispeech, title={Librispeech: an asr corpus based on public domain audio books}, author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev}, booktitle={2015 IEEE international conference on acoustics, speech and signal processing (ICASSP)}, pages={5206--5210}, year={2015}, organization={IEEE} } ```

提供机构：

changelinglab

5,000+

优质数据集

54 个

任务类型

进入经典数据集