five

shangeth/libriasr-mimi-codes

收藏
Hugging Face2026-03-12 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/shangeth/libriasr-mimi-codes
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc-by-4.0 tags: - audio - text-to-speech - mimi - librispeech - multi-speaker - speech-synthesis - codec task_categories: - text-to-speech pretty_name: LibriSpeech ASR — Kyutai Mimi Encoded size_categories: - 100K<n<1M --- # LibriSpeech ASR — Kyutai Mimi Encoded [LibriSpeech ASR](https://www.openslr.org/12) (train.clean.100) pre-encoded with the [Kyutai Mimi](https://huggingface.co/kyutai/mimi) neural audio codec. Instead of raw waveforms, every utterance is stored as a compact matrix of discrete codec tokens. This format is ready to use directly in any language-model-style audio generation pipeline without needing a GPU encoder at training time. ## What's inside ``` manifest.jsonl # metadata — one JSON record per utterance spk_index.json # { "speaker_id": [idx, idx, ...] } — speaker-to-utterance index shards/ ├── shard_0000.pt # packed dict of { idx -> (8, L) int16 code tensor } ├── shard_0001.pt └── ... ``` Each `manifest.jsonl` record: ```json { "idx": 0, "text": "He was in a confused state of mind.", "codes_file": "shards/shard_0000.pt:0", "speaker_id": "1234", "n_frames": 198 } ``` `spk_index.json` maps each speaker ID to the list of utterance indices for that speaker, useful for sampling reference audio in speaker-conditioned tasks. ## Dataset details | | | |---|---| | Source | [LibriSpeech ASR train.clean.100](https://www.openslr.org/12) | | Speakers | ~251 | | Utterances | ~28,000 | | Total duration | ~100 hours | | Codec | [Kyutai Mimi](https://huggingface.co/kyutai/mimi) | | Codec sample rate | 24,000 Hz | | Codec frame rate | 12.5 fps | | Codebooks | 8 | | Token dtype | int16 | | License | CC BY 4.0 | ## What you can use this for - Multi-speaker / voice-cloning TTS research - Speaker-conditioned codec language models - Speaker representation learning - Audio tokenization benchmarks - Any task that benefits from a diverse, multi-speaker English speech corpus in discrete token form
提供机构:
shangeth
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作