Name: changelinglab/cv-v1.0-segment
Creator: changelinglab
Published: 2026-04-12 21:03:42
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/changelinglab/cv-v1.0-segment

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc0-1.0 task_categories: - automatic-speech-recognition - audio-classification language: - ba - be - ca - de - en - es - fr - it - rw - sw pretty_name: CommonVoice v1 Phone-Segment Alignments size_categories: - 1M<n<10M --- # CommonVoice v1 Phone-Segment Alignments Phone-level time alignments for **10 languages** of Mozilla Common Voice, packaged in a canonical segmentation schema with embedded 16 kHz audio. The phone boundaries come from the [`charsiu/cv_ali`](https://huggingface.co/datasets/charsiu/cv_ali) release of MFA alignments; the audio and transcripts come from [Common Voice Corpus 13.0](https://commonvoice.mozilla.org/) (2023-03-09). ## Dataset summary | lang | train rows | train hrs | val rows | val hrs | test rows | test hrs | |:----:|-----------:|----------:|---------:|--------:|----------:|---------:| | en | 1,008,669 | 1,354.0 | 3,537 | 4.9 | 1,285 | 1.7 | | rw | 909,252 | 1,115.5 | 0 | 0.0 | 0 | 0.0 | | ca | 863,474 | 1,117.2 | 7,345 | 10.2 | 4,309 | 5.9 | | de | 539,246 | 729.7 | 1,825 | 2.5 | 13 | 0.0 | | fr | 508,782 | 605.1 | 1,912 | 2.4 | 242 | 0.3 | | be | 317,391 | 360.6 | 1,627 | 2.0 | 143 | 0.2 | | es | 277,324 | 343.9 | 1,076 | 1.4 | 50 | 0.1 | | it | 162,430 | 208.3 | 354 | 0.5 | 0 | 0.0 | | ba | 118,482 | 120.8 | 483 | 0.3 | 0 | 0.0 | | sw | 29,194 | 39.3 | 4,757 | 6.3 | 697 | 0.9 | | **TOTAL** | **4,734,244** | **5,994.5** | **22,916** | **30.5** | **6,739** | **9.1** | **Grand total: 4,763,899 utterances / 6,034.1 hours of aligned speech** across train/val/test. ## Schema | field | type | notes | |---------------|-----------------------------------------|-------| | `utt_id` | `string` | CommonVoice clip stem (e.g. `common_voice_en_12345`) | | `audio` | `Audio(sampling_rate=16000)` | mp3 bytes embedded in the parquet shards; resampled on decode | | `text` | `string` | sentence from CV `{split}.tsv` | | `phones` | `Sequence[string]` | IPA phone labels from MFA | | `phone_starts`| `Sequence[float64]` | start time (seconds) of each phone | | `phone_ends` | `Sequence[float64]` | end time (seconds) of each phone | | `language` | `string` | ISO 639-1 code (`ba`, `be`, `ca`, ...) | | `speaker_id` | `string` | CV `client_id` (SHA hash) | | `duration` | `float64` | last phone end time (seconds) | | `split` | `string` | `train`, `val`, or `test` | Phone inventory is MFA's IPA output. Empty-label silence intervals from the source TextGrid are dropped. ## Sources & attribution - **Audio & transcripts** — Mozilla Common Voice Corpus 13.0, released 2023-03-09, under CC0 1.0. - **Phone-level alignments** — [`charsiu/cv_ali`](https://huggingface.co/datasets/charsiu/cv_ali), produced with MFA (Montreal Forced Aligner). ## Citations ```bibtex @inproceedings{ardila-etal-2020-common, title = "{C}ommon {V}oice: A Massively-Multilingual Speech Corpus", author = "Ardila, Rosana and Branson, Megan and Davis, Kelly and Kohler, Michael and Meyer, Josh and Henretty, Michael and Morais, Reuben and Saunders, Lindsay and Tyers, Francis and Weber, Gregor", booktitle = "Proceedings of the Twelfth Language Resources and Evaluation Conference", month = may, year = "2020", address = "Marseille, France", publisher = "European Language Resources Association", url = "https://aclanthology.org/2020.lrec-1.520", pages = "4218--4222", language = "English", ISBN = "979-10-95546-34-4", } ``` ## License Released under [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/), matching the upstream Common Voice and Charsiu alignment licenses.

应用场景：