InstructS2S-200K
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/ICTNLP/InstructS2S-200K
下载链接
链接失效反馈官方服务:
资源简介:
# InstructS2S-200K
## Dataset Description
**InstructS2S-200K** is a multi-turn speech-to-speech conversation dataset containing approximately 200,000 dialogues, developed for the LLaMA-Omni and LLaMA-Omni 2 research projects on real-time spoken chatbots.
## Usage
The dataset is split into multiple parts and needs to be reconstructed:
```bash
# Combine the parts and extract
cat en_part_* > instructs2s_200k.tar.gz
tar -xzf instructs2s_200k.tar.gz
```
## License
This dataset is released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. It is available for academic research purposes only, but cannot be used for commercial purposes.
## Citation
```bibtex
@inproceedings{fang2025llamaomni2,
title={LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis},
author={Fang, Qingkai and Zhou, Yan and Guo, Shoutao and Zhang, Shaolei and Feng, Yang},
booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics},
year={2025}
}
@inproceedings{fang2025llamaomni,
title={LLaMA-Omni: Seamless Speech Interaction with Large Language Models},
author={Fang, Qingkai and Guo, Shoutao and Zhou, Yan and Ma, Zhengrui and Zhang, Shaolei and Feng, Yang},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025}
}
```
# InstructS2S-200K
## 数据集说明
**InstructS2S-200K** 是一款多轮语音-语音对话数据集,包含约20万条对话,专为LLaMA-Omni与LLaMA-Omni 2两项实时口语对话机器人相关研究项目打造。
## 使用方法
本数据集被拆分为多个分片,需进行合并重构后方可使用:
bash
# 合并所有分片并解压
cat en_part_* > instructs2s_200k.tar.gz
tar -xzf instructs2s_200k.tar.gz
## 许可协议
本数据集采用知识共享署名-非商业性使用4.0国际许可协议(Creative Commons Attribution-NonCommercial 4.0 International,CC BY-NC 4.0)发布,仅可用于学术研究用途,不得用于商业用途。
## 引用格式
bibtex
@inproceedings{fang2025llamaomni2,
title={LLaMA-Omni 2: 基于大语言模型(Large Language Model,LLM)的自回归流式语音合成实时口语对话机器人},
author={Fang, Qingkai and Zhou, Yan and Guo, Shoutao and Zhang, Shaolei and Feng, Yang},
booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics},
year={2025}
}
@inproceedings{fang2025llamaomni,
title={LLaMA-Omni:与大语言模型(Large Language Model,LLM)的无缝语音交互},
author={Fang, Qingkai and Guo, Shoutao and Zhou, Yan and Ma, Zhengrui and Zhang, Shaolei and Feng, Yang},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025}
}
提供机构:
maas
创建时间:
2025-06-19



