Pranavz/emilia-en-mimi-q8-s4096-dynamic-20260329a-public
收藏Hugging Face2026-03-30 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Pranavz/emilia-en-mimi-q8-s4096-dynamic-20260329a-public
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: emilia-en-mimi-q8-s4096-dynamic-20260329a-public
language:
- en
task_categories:
- text-to-speech
size_categories:
- n>1M
---
# emilia-en-mimi-q8-s4096-dynamic-20260329a-public
Frozen pretokenized Emilia-English model-ready dataset for TinyAya + Mimi training.
## Layout
- `train/lang=en/*.parquet`
- optional `validation/lang=en/*.parquet`
- optional `test/lang=en/*.parquet`
- `dataset_manifest.json`
## Selection
- source dataset: `amphion/Emilia-Dataset`
- data files: `Emilia/EN/*.tar`
- source split: `train`
- quantizers: `8`
- train samples: `18136270`
- validation samples: `0`
- test samples: `0`
- min seconds: `1.0`
- max seconds: `30.0`
## Audio Codec
- backend: `mimi`
- source: `hf_pretrained`
- model: `kyutai/mimi`
- sample rate: `24000`
## Notes
This repo stores pretokenized training artifacts, not raw audio. Use `dataset_manifest.json`
as the immutable split fingerprint for ablation reproducibility.
提供机构:
Pranavz



