five

meetween/mumospee_v1_fix

收藏
Hugging Face2026-03-10 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/meetween/mumospee_v1_fix
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 --- ## Dataset Statistics > Generated from: train, test, validation/ ### Overview (All Splits Combined) | Metric | Value | |--------|-------| | Total samples | 53,983,241 | | Total audio duration | 121,957h 08m 34.5s (121,957.1 hours) | | Average duration per sample | 8.13s | | Avg transcript length | 16.5 words | | Total parquet shards | 29 | ### Per-Split Overview | Split | # Samples | Duration | Avg Duration | Avg Words | Shards | |-------|----------:|---------:|-------------:|----------:|-------:| | `train` | 53,319,102 | 120,878h 42m 30.2s (120,878.7h) | 8.16s | 16.5 | 27 | | `test` | 341,118 | 547h 25m 08.3s (547.4h) | 5.78s | 10.3 | 1 | | `validation` | 323,021 | 531h 00m 56.0s (531.0h) | 5.92s | 10.4 | 1 | ### Language Distribution | Value | train samples | train % | test samples | test % | validation samples | validation % | Total samples | Total % | Total Duration | Total Dur % | |-------|----------:|---------:|----------:|---------:|----------:|---------:|----------:|---------:|---------------:|------------:| | `en` | 28,643,120 | 53.72% | 266,720 | 78.19% | 248,203 | 76.84% | 29,158,043 | 54.01% | 66,374h 54m 20.5s | 54.42% | | `zh` | 19,969,319 | 37.45% | 0 | 0.00% | 0 | 0.00% | 19,969,319 | 36.99% | 49,922h 33m 08.9s | 40.93% | | `ja` | 869,665 | 1.63% | 0 | 0.00% | 0 | 0.00% | 869,665 | 1.61% | 1,715h 27m 28.6s | 1.41% | | `de` | 841,219 | 1.58% | 13,511 | 3.96% | 13,511 | 4.18% | 868,241 | 1.61% | 1,751h 41m 54.2s | 1.44% | | `fr` | 777,904 | 1.46% | 14,760 | 4.33% | 14,760 | 4.57% | 807,424 | 1.50% | 1,607h 08m 32.1s | 1.32% | | `es` | 165,080 | 0.31% | 13,221 | 3.88% | 13,221 | 4.09% | 191,522 | 0.35% | 141h 08m 41.8s | 0.12% | | `it` | 123,812 | 0.23% | 8,183 | 2.40% | 8,940 | 2.77% | 140,935 | 0.26% | 56h 31m 19.1s | 0.05% | | `cs` | 106,037 | 0.20% | 0 | 0.00% | 0 | 0.00% | 106,037 | 0.20% | 0.00s | 0.00% | | `et` | 100,734 | 0.19% | 1,571 | 0.46% | 1,576 | 0.49% | 103,881 | 0.19% | 8h 57m 04.6s | 0.01% | | `pl` | 102,193 | 0.19% | 0 | 0.00% | 0 | 0.00% | 102,193 | 0.19% | 0.00s | 0.00% | | `sl` | 100,710 | 0.19% | 360 | 0.11% | 509 | 0.16% | 101,579 | 0.19% | 2h 35m 09.0s | 0.00% | | `fi` | 100,236 | 0.19% | 0 | 0.00% | 0 | 0.00% | 100,236 | 0.19% | 0.00s | 0.00% | | `sv` | 99,891 | 0.19% | 0 | 0.00% | 0 | 0.00% | 99,891 | 0.19% | 0.00s | 0.00% | | `el` | 99,761 | 0.19% | 0 | 0.00% | 0 | 0.00% | 99,761 | 0.18% | 0.00s | 0.00% | | `pt` | 99,487 | 0.19% | 0 | 0.00% | 0 | 0.00% | 99,487 | 0.18% | 0.00s | 0.00% | | `ro` | 99,411 | 0.19% | 0 | 0.00% | 0 | 0.00% | 99,411 | 0.18% | 0.00s | 0.00% | | `nl` | 99,400 | 0.19% | 0 | 0.00% | 0 | 0.00% | 99,400 | 0.18% | 0.00s | 0.00% | | `hu` | 99,143 | 0.19% | 0 | 0.00% | 0 | 0.00% | 99,143 | 0.18% | 0.00s | 0.00% | | `lt` | 99,078 | 0.19% | 0 | 0.00% | 0 | 0.00% | 99,078 | 0.18% | 0.00s | 0.00% | | `da` | 98,868 | 0.19% | 0 | 0.00% | 0 | 0.00% | 98,868 | 0.18% | 0.00s | 0.00% | | `hr` | 97,028 | 0.18% | 0 | 0.00% | 0 | 0.00% | 97,028 | 0.18% | 0.00s | 0.00% | | `lv` | 92,504 | 0.17% | 1,629 | 0.48% | 1,125 | 0.35% | 95,258 | 0.18% | 4h 55m 19.3s | 0.00% | | `mt` | 94,360 | 0.18% | 0 | 0.00% | 0 | 0.00% | 94,360 | 0.17% | 0.00s | 0.00% | | `sk` | 92,345 | 0.17% | 0 | 0.00% | 0 | 0.00% | 92,345 | 0.17% | 0.00s | 0.00% | | `ko` | 92,184 | 0.17% | 0 | 0.00% | 0 | 0.00% | 92,184 | 0.17% | 217h 09m 58.0s | 0.18% | | `bg` | 89,209 | 0.17% | 0 | 0.00% | 0 | 0.00% | 89,209 | 0.17% | 0.00s | 0.00% | | `ca` | 54,255 | 0.10% | 12,730 | 3.73% | 12,730 | 3.94% | 79,715 | 0.15% | 119h 48m 09.3s | 0.10% | | `fa` | 4,348 | 0.01% | 3,445 | 1.01% | 3,445 | 1.07% | 11,238 | 0.02% | 14h 20m 32.8s | 0.01% | | `ar` | 2,776 | 0.01% | 1,695 | 0.50% | 1,758 | 0.54% | 6,229 | 0.01% | 5h 35m 01.6s | 0.00% | | `mn` | 2,018 | 0.00% | 1,759 | 0.52% | 1,761 | 0.55% | 5,538 | 0.01% | 8h 21m 37.1s | 0.01% | | `id` | 1,243 | 0.00% | 844 | 0.25% | 792 | 0.25% | 2,879 | 0.01% | 2h 58m 58.8s | 0.00% | | `cy` | 763 | 0.00% | 690 | 0.20% | 690 | 0.21% | 2,143 | 0.00% | 3h 01m 18.6s | 0.00% | | `nn` | 426 | 0.00% | 0 | 0.00% | 0 | 0.00% | 426 | 0.00% | 0.00s | 0.00% | | `la` | 289 | 0.00% | 0 | 0.00% | 0 | 0.00% | 289 | 0.00% | 0.00s | 0.00% | | `ru` | 113 | 0.00% | 0 | 0.00% | 0 | 0.00% | 113 | 0.00% | 0.00s | 0.00% | | `he` | 66 | 0.00% | 0 | 0.00% | 0 | 0.00% | 66 | 0.00% | 0.00s | 0.00% | | `sq` | 40 | 0.00% | 0 | 0.00% | 0 | 0.00% | 40 | 0.00% | 0.00s | 0.00% | | `tr` | 35 | 0.00% | 0 | 0.00% | 0 | 0.00% | 35 | 0.00% | 0.00s | 0.00% | | `gl` | 15 | 0.00% | 0 | 0.00% | 0 | 0.00% | 15 | 0.00% | 0.00s | 0.00% | | `uk` | 10 | 0.00% | 0 | 0.00% | 0 | 0.00% | 10 | 0.00% | 0.00s | 0.00% | | `af` | 2 | 0.00% | 0 | 0.00% | 0 | 0.00% | 2 | 0.00% | 0.00s | 0.00% | | `jw` | 1 | 0.00% | 0 | 0.00% | 0 | 0.00% | 1 | 0.00% | 0.00s | 0.00% | | `ur` | 1 | 0.00% | 0 | 0.00% | 0 | 0.00% | 1 | 0.00% | 0.00s | 0.00% | | `sr` | 1 | 0.00% | 0 | 0.00% | 0 | 0.00% | 1 | 0.00% | 0.00s | 0.00% | | `hy` | 1 | 0.00% | 0 | 0.00% | 0 | 0.00% | 1 | 0.00% | 0.00s | 0.00% | | `no` | 1 | 0.00% | 0 | 0.00% | 0 | 0.00% | 1 | 0.00% | 0.00s | 0.00% | ### Tag / Source Distribution | Value | train samples | train % | test samples | test % | validation samples | validation % | Total samples | Total % | Total Duration | Total Dur % | |-------|----------:|---------:|----------:|---------:|----------:|---------:|----------:|---------:|---------------:|------------:| | `Emilia` | 40,237,834 | 75.47% | 0 | 0.00% | 0 | 0.00% | 40,237,834 | 74.54% | 101,585h 04m 02.8s | 83.30% | | `GigaSpeech` | 5,053,116 | 9.48% | 0 | 0.00% | 0 | 0.00% | 5,053,116 | 9.36% | 6,297h 24m 07.6s | 5.16% | | `CoVoST` | 3,591,777 | 6.74% | 290,706 | 85.22% | 288,492 | 89.31% | 4,170,975 | 7.73% | 6,519h 01m 42.7s | 5.35% | | `MOSEL` | 2,300,046 | 4.31% | 0 | 0.00% | 0 | 0.00% | 2,300,046 | 4.26% | 0.00s | 0.00% | | `PeopleSpeech` | 1,501,271 | 2.82% | 34,898 | 10.23% | 18,622 | 5.76% | 1,554,791 | 2.88% | 5,987h 42m 22.5s | 4.91% | | `LibriTTS` | 353,817 | 0.66% | 9,955 | 2.92% | 10,340 | 3.20% | 374,112 | 0.69% | 585h 37m 48.6s | 0.48% | | `Librispeech` | 281,241 | 0.53% | 5,559 | 1.63% | 5,567 | 1.72% | 292,367 | 0.54% | 982h 18m 30.3s | 0.81% | ### License Distribution | Value | train samples | train % | test samples | test % | validation samples | validation % | Total samples | Total % | |-------|----------:|---------:|----------:|---------:|----------:|---------:|----------:|---------:| | `CC-BY-4.0` | 43,172,938 | 80.97% | 15,514 | 4.55% | 15,907 | 4.92% | 43,204,359 | 80.03% | | `unknown` | 5,053,116 | 9.48% | 0 | 0.00% | 0 | 0.00% | 5,053,116 | 9.36% | | `CC0` | 3,591,777 | 6.74% | 290,706 | 85.22% | 288,492 | 89.31% | 4,170,975 | 7.73% | | `CC-BY;CC-BY-SA` | 1,501,271 | 2.82% | 34,898 | 10.23% | 18,622 | 5.76% | 1,554,791 | 2.88% | ### Example usage ```python # pip install datasets from datasets import load_dataset # ── Load all splits at once ─────────────────────────────────────────────────── dataset = load_dataset("meetween/mumospee") print(dataset) # DatasetDict({ # train: Dataset({features: [...], num_rows: ...}) # test: Dataset({features: [...], num_rows: ...}) # validation: Dataset({features: [...], num_rows: ...}) # }) # ── Load a specific split ───────────────────────────────────────────────────── train_data = load_dataset("meetween/mumospee", split="train") test_data = load_dataset("meetween/mumospee", split="test") validation_data = load_dataset("meetween/mumospee", split="validation") ``` ### Notes - `train`: 0 rows with unparseable duration (excluded from duration stats) - `test`: 0 rows with unparseable duration (excluded from duration stats) - `validation`: 0 rows with unparseable duration (excluded from duration stats) - Stats generated in 370.7s total
提供机构:
meetween
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作