elise-en-nano-codec-dataset
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/nineninesix/elise-en-nano-codec-dataset
下载链接
链接失效反馈官方服务:
资源简介:
# Elise EN Nano-Codec Dataset
This dataset is built upon the [Elise dataset](https://huggingface.co/datasets/MrDragonFox/Elise) and re-encoded using NVIDIA’s [NeMo Audio Codec](https://huggingface.co/nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps) into **nano audio tokens**.
It is designed for **fine-tuning multimodal LLMs** and **speech systems (TTS/ASR)** that rely on codec-based audio token representations.
---
## Dataset Structure
- **text**: transcription of the utterance.
- **speaker**: speaker identifier (string).
- **nano_layer_1 … nano_layer_4**: tokenized audio representations from the NVIDIA NeMo Nano Codec (4-layer quantization).
- **encoded_len**: sequence length of encoded audio tokens.
---
## Use Cases
- **Fine-tuning TTS** models with codec-based speech tokens.
- **Training ASR** systems that operate on discrete audio units.
- **Multimodal LLM adaptation**, where text and audio tokens are combined.
This format makes it easier to build compact and efficient speech-enabled LLMs.
---
## Example
```python
from datasets import load_dataset
ds = load_dataset("nnineninesix/elise-en-nano-codec-dataset", split="train")
print(ds[0]["text"])
# "Ribbit Nice to meet you, Stephen."
print(ds[0]["nano_layer_1"][:10])
# [1633, 2685, 3825, 1392, ...]
````
---
## Credits
* Original data: [Elise dataset](https://huggingface.co/datasets/MrDragonFox/Elise).
* Audio codec tokenization: [NVIDIA NeMo Codec](https://huggingface.co/nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps).
# Elise 英文 Nano-Codec 数据集
本数据集基于[Elise数据集](https://huggingface.co/datasets/MrDragonFox/Elise)构建,并通过英伟达(NVIDIA)的[NeMo音频编解码器(NeMo Audio Codec)](https://huggingface.co/nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps)重新编码为**纳米音频Token(nano audio tokens)**。
本数据集专为依赖基于编解码器的音频Token表征的**多模态大语言模型(multimodal LLMs)微调**以及**语音系统(文本转语音TTS/自动语音识别ASR)**开发。
---
## 数据集结构
- **text**:语音转写文本。
- **speaker**:说话人标识符(字符串类型)。
- **nano_layer_1 至 nano_layer_4**:来自英伟达NeMo Nano编解码器的Token化音频表征(采用4层量化)。
- **encoded_len**:编码后音频Token的序列长度。
---
## 应用场景
- 基于编解码语音Token微调**文本转语音(TTS)**模型。
- 训练基于离散音频单元的**自动语音识别(ASR)**系统。
- 实现**多模态大语言模型(multimodal LLM)**适配,可融合文本与音频Token。
该格式可简化紧凑高效的语音赋能大语言模型的构建流程。
---
## 示例
python
from datasets import load_dataset
ds = load_dataset("nnineninesix/elise-en-nano-codec-dataset", split="train")
print(ds[0]["text"])
# "Ribbit 很高兴见到你,斯蒂芬。"
print(ds[0]["nano_layer_1"][:10])
# [1633, 2685, 3825, 1392, ...]
---
## 致谢
* 原始数据:[Elise数据集](https://huggingface.co/datasets/MrDragonFox/Elise)。
* 音频编解码器Token化:[英伟达NeMo编解码器(NVIDIA NeMo Codec)](https://huggingface.co/nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps)。
提供机构:
maas
创建时间:
2025-09-27



