changelinglab/cv-v1.0-segment
收藏Hugging Face2026-04-12 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/changelinglab/cv-v1.0-segment
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc0-1.0
task_categories:
- automatic-speech-recognition
- audio-classification
language:
- ba
- be
- ca
- de
- en
- es
- fr
- it
- rw
- sw
pretty_name: CommonVoice v1 Phone-Segment Alignments
size_categories:
- 1M<n<10M
---
# CommonVoice v1 Phone-Segment Alignments
Phone-level time alignments for **10 languages** of Mozilla Common Voice,
packaged in a canonical segmentation schema with embedded 16 kHz audio. The
phone boundaries come from the [`charsiu/cv_ali`](https://huggingface.co/datasets/charsiu/cv_ali)
release of MFA alignments; the audio and transcripts come from
[Common Voice Corpus 13.0](https://commonvoice.mozilla.org/) (2023-03-09).
## Dataset summary
| lang | train rows | train hrs | val rows | val hrs | test rows | test hrs |
|:----:|-----------:|----------:|---------:|--------:|----------:|---------:|
| en | 1,008,669 | 1,354.0 | 3,537 | 4.9 | 1,285 | 1.7 |
| rw | 909,252 | 1,115.5 | 0 | 0.0 | 0 | 0.0 |
| ca | 863,474 | 1,117.2 | 7,345 | 10.2 | 4,309 | 5.9 |
| de | 539,246 | 729.7 | 1,825 | 2.5 | 13 | 0.0 |
| fr | 508,782 | 605.1 | 1,912 | 2.4 | 242 | 0.3 |
| be | 317,391 | 360.6 | 1,627 | 2.0 | 143 | 0.2 |
| es | 277,324 | 343.9 | 1,076 | 1.4 | 50 | 0.1 |
| it | 162,430 | 208.3 | 354 | 0.5 | 0 | 0.0 |
| ba | 118,482 | 120.8 | 483 | 0.3 | 0 | 0.0 |
| sw | 29,194 | 39.3 | 4,757 | 6.3 | 697 | 0.9 |
| **TOTAL** | **4,734,244** | **5,994.5** | **22,916** | **30.5** | **6,739** | **9.1** |
**Grand total: 4,763,899 utterances / 6,034.1 hours of aligned speech**
across train/val/test.
## Schema
| field | type | notes |
|---------------|-----------------------------------------|-------|
| `utt_id` | `string` | CommonVoice clip stem (e.g. `common_voice_en_12345`) |
| `audio` | `Audio(sampling_rate=16000)` | mp3 bytes embedded in the parquet shards; resampled on decode |
| `text` | `string` | sentence from CV `{split}.tsv` |
| `phones` | `Sequence[string]` | IPA phone labels from MFA |
| `phone_starts`| `Sequence[float64]` | start time (seconds) of each phone |
| `phone_ends` | `Sequence[float64]` | end time (seconds) of each phone |
| `language` | `string` | ISO 639-1 code (`ba`, `be`, `ca`, ...) |
| `speaker_id` | `string` | CV `client_id` (SHA hash) |
| `duration` | `float64` | last phone end time (seconds) |
| `split` | `string` | `train`, `val`, or `test` |
Phone inventory is MFA's IPA output. Empty-label silence intervals from the
source TextGrid are dropped.
## Sources & attribution
- **Audio & transcripts** — Mozilla Common Voice Corpus 13.0, released
2023-03-09, under CC0 1.0.
- **Phone-level alignments** — [`charsiu/cv_ali`](https://huggingface.co/datasets/charsiu/cv_ali),
produced with MFA (Montreal Forced Aligner).
## Citations
```bibtex
@inproceedings{ardila-etal-2020-common,
title = "{C}ommon {V}oice: A Massively-Multilingual Speech Corpus",
author = "Ardila, Rosana and
Branson, Megan and
Davis, Kelly and
Kohler, Michael and
Meyer, Josh and
Henretty, Michael and
Morais, Reuben and
Saunders, Lindsay and
Tyers, Francis and
Weber, Gregor",
booktitle = "Proceedings of the Twelfth Language Resources and Evaluation Conference",
month = may,
year = "2020",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://aclanthology.org/2020.lrec-1.520",
pages = "4218--4222",
language = "English",
ISBN = "979-10-95546-34-4",
}
```
## License
Released under [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/),
matching the upstream Common Voice and Charsiu alignment licenses.
提供机构:
changelinglab



