ICTNLP/InstructS2S-200K
收藏Hugging Face2025-11-30 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/ICTNLP/InstructS2S-200K
下载链接
链接失效反馈官方服务:
资源简介:
# InstructS2S-200K
## Dataset Description
**InstructS2S-200K** is a multi-turn speech-to-speech conversation dataset containing approximately 200,000 dialogues, developed for the LLaMA-Omni and LLaMA-Omni 2 research projects on real-time spoken chatbots.
## Usage
The dataset is split into multiple parts and needs to be reconstructed:
```bash
# Combine the parts and extract
cat en_part_* > instructs2s_200k.tar.gz
tar -xzf instructs2s_200k.tar.gz
```
## License
This dataset is released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. It is available for academic research purposes only, but cannot be used for commercial purposes.
## Citation
```bibtex
@inproceedings{fang2025llamaomni2,
title={LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis},
author={Fang, Qingkai and Zhou, Yan and Guo, Shoutao and Zhang, Shaolei and Feng, Yang},
booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics},
year={2025}
}
@inproceedings{fang2025llamaomni,
title={LLaMA-Omni: Seamless Speech Interaction with Large Language Models},
author={Fang, Qingkai and Guo, Shoutao and Zhou, Yan and Ma, Zhengrui and Zhang, Shaolei and Feng, Yang},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025}
}
```
提供机构:
ICTNLP



