five

treadon/speech-dac-16khz-2cb

收藏
Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/treadon/speech-dac-16khz-2cb
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc-by-4.0 task_categories: - text-to-speech tags: - dac - audio-tokens - speech - tts - codebook - descript-audio-codec - librispeech - 16khz pretty_name: Speech DAC Tokens 16kHz (2 Codebooks) size_categories: - 10K<n<100K --- # Speech DAC Tokens 16kHz (2 Codebooks) Pre-tokenized speech dataset using [DAC](https://github.com/descriptinc/descript-audio-codec) at 16kHz with 2 codebooks. Optimized for speech TTS training — 16kHz captures the full speech frequency range without wasting capacity on inaudible frequencies. ## Why 16kHz? - **Speech lives below 8kHz** — 16kHz sample rate is sufficient (Nyquist) - **50 tokens/sec per codebook** vs 87 at 44kHz — shorter sequences, faster training - **2 codebooks at 16kHz produce intelligible speech** — verified by listening tests - **No resampling needed** — LibriSpeech is natively 16kHz ## Dataset Summary | Stat | Value | |------|-------| | **Total samples** | 132,479 | | **Total audio** | ~464 hours | | **Source** | LibriSpeech clean-100 + clean-360 | | **Language** | English | | **DAC model** | 16kHz, 2 of 12 codebooks | | **Codebook size** | 1,024 entries each | | **Tokens per second** | 100 (50/codebook x 2) | | **Token sequence length** | 149-2,047 (mean: 1,327) | ## Format | Column | Type | Description | |--------|------|-------------| | `text` | string | Original text transcription | | `prompt` | string | `{text}<\|audio_start\|><\|c1_X\|><\|c2_Y\|>...<\|audio_end\|>` | | `input_ids` | list[int] | Pre-tokenized with Qwen3-0.6B + 2cb DAC tokens | | `attention_mask` | list[int] | All 1s | | `labels` | list[int] | Copy of input_ids | | `n_audio_frames` | int | Number of DAC time frames | | `n_tokens` | int | Total token count | Audio tokens interleaved: `c1, c2, c1, c2, ...` per frame. ## Related - **Training code:** [treadon/ri-tts](https://github.com/treadon/ri-tts) on GitHub - **44kHz dataset (3cb):** [treadon/speech-dac-tokens-3cb](https://huggingface.co/datasets/treadon/speech-dac-tokens-3cb) (241K samples, kept for reference) ## Citation ```bibtex @inproceedings{panayotov2015librispeech, title={Librispeech: an ASR corpus based on public domain audio books}, author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev}, booktitle={ICASSP}, year={2015} } ```
提供机构:
treadon
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作