five

InstructS2S-200K

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/ICTNLP/InstructS2S-200K
下载链接
链接失效反馈
官方服务:
资源简介:
# InstructS2S-200K ## Dataset Description **InstructS2S-200K** is a multi-turn speech-to-speech conversation dataset containing approximately 200,000 dialogues, developed for the LLaMA-Omni and LLaMA-Omni 2 research projects on real-time spoken chatbots. ## Usage The dataset is split into multiple parts and needs to be reconstructed: ```bash # Combine the parts and extract cat en_part_* > instructs2s_200k.tar.gz tar -xzf instructs2s_200k.tar.gz ``` ## License This dataset is released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. It is available for academic research purposes only, but cannot be used for commercial purposes. ## Citation ```bibtex @inproceedings{fang2025llamaomni2, title={LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis}, author={Fang, Qingkai and Zhou, Yan and Guo, Shoutao and Zhang, Shaolei and Feng, Yang}, booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics}, year={2025} } @inproceedings{fang2025llamaomni, title={LLaMA-Omni: Seamless Speech Interaction with Large Language Models}, author={Fang, Qingkai and Guo, Shoutao and Zhou, Yan and Ma, Zhengrui and Zhang, Shaolei and Feng, Yang}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2025} } ```

# InstructS2S-200K ## 数据集说明 **InstructS2S-200K** 是一款多轮语音-语音对话数据集,包含约20万条对话,专为LLaMA-Omni与LLaMA-Omni 2两项实时口语对话机器人相关研究项目打造。 ## 使用方法 本数据集被拆分为多个分片,需进行合并重构后方可使用: bash # 合并所有分片并解压 cat en_part_* > instructs2s_200k.tar.gz tar -xzf instructs2s_200k.tar.gz ## 许可协议 本数据集采用知识共享署名-非商业性使用4.0国际许可协议(Creative Commons Attribution-NonCommercial 4.0 International,CC BY-NC 4.0)发布,仅可用于学术研究用途,不得用于商业用途。 ## 引用格式 bibtex @inproceedings{fang2025llamaomni2, title={LLaMA-Omni 2: 基于大语言模型(Large Language Model,LLM)的自回归流式语音合成实时口语对话机器人}, author={Fang, Qingkai and Zhou, Yan and Guo, Shoutao and Zhang, Shaolei and Feng, Yang}, booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics}, year={2025} } @inproceedings{fang2025llamaomni, title={LLaMA-Omni:与大语言模型(Large Language Model,LLM)的无缝语音交互}, author={Fang, Qingkai and Guo, Shoutao and Zhou, Yan and Ma, Zhengrui and Zhang, Shaolei and Feng, Yang}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2025} }
提供机构:
maas
创建时间:
2025-06-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作