shangeth/libriasr-mimi-codes

Name: shangeth/libriasr-mimi-codes
Creator: shangeth
Published: 2026-03-12 11:29:20
License: 暂无描述

Hugging Face2026-03-12 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/shangeth/libriasr-mimi-codes

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: cc-by-4.0 tags: - audio - text-to-speech - mimi - librispeech - multi-speaker - speech-synthesis - codec task_categories: - text-to-speech pretty_name: LibriSpeech ASR — Kyutai Mimi Encoded size_categories: - 100K<n<1M --- # LibriSpeech ASR — Kyutai Mimi Encoded [LibriSpeech ASR](https://www.openslr.org/12) (train.clean.100) pre-encoded with the [Kyutai Mimi](https://huggingface.co/kyutai/mimi) neural audio codec. Instead of raw waveforms, every utterance is stored as a compact matrix of discrete codec tokens. This format is ready to use directly in any language-model-style audio generation pipeline without needing a GPU encoder at training time. ## What's inside ``` manifest.jsonl # metadata — one JSON record per utterance spk_index.json # { "speaker_id": [idx, idx, ...] } — speaker-to-utterance index shards/ ├── shard_0000.pt # packed dict of { idx -> (8, L) int16 code tensor } ├── shard_0001.pt └── ... ``` Each `manifest.jsonl` record: ```json { "idx": 0, "text": "He was in a confused state of mind.", "codes_file": "shards/shard_0000.pt:0", "speaker_id": "1234", "n_frames": 198 } ``` `spk_index.json` maps each speaker ID to the list of utterance indices for that speaker, useful for sampling reference audio in speaker-conditioned tasks. ## Dataset details | | | |---|---| | Source | [LibriSpeech ASR train.clean.100](https://www.openslr.org/12) | | Speakers | ~251 | | Utterances | ~28,000 | | Total duration | ~100 hours | | Codec | [Kyutai Mimi](https://huggingface.co/kyutai/mimi) | | Codec sample rate | 24,000 Hz | | Codec frame rate | 12.5 fps | | Codebooks | 8 | | Token dtype | int16 | | License | CC BY 4.0 | ## What you can use this for - Multi-speaker / voice-cloning TTS research - Speaker-conditioned codec language models - Speaker representation learning - Audio tokenization benchmarks - Any task that benefits from a diverse, multi-speaker English speech corpus in discrete token form

提供机构：

shangeth

5,000+

优质数据集

54 个

任务类型

进入经典数据集