five

ICTNLP/InstructS2S-200K

收藏
Hugging Face2025-11-30 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/ICTNLP/InstructS2S-200K
下载链接
链接失效反馈
官方服务:
资源简介:
# InstructS2S-200K ## Dataset Description **InstructS2S-200K** is a multi-turn speech-to-speech conversation dataset containing approximately 200,000 dialogues, developed for the LLaMA-Omni and LLaMA-Omni 2 research projects on real-time spoken chatbots. ## Usage The dataset is split into multiple parts and needs to be reconstructed: ```bash # Combine the parts and extract cat en_part_* > instructs2s_200k.tar.gz tar -xzf instructs2s_200k.tar.gz ``` ## License This dataset is released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. It is available for academic research purposes only, but cannot be used for commercial purposes. ## Citation ```bibtex @inproceedings{fang2025llamaomni2, title={LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis}, author={Fang, Qingkai and Zhou, Yan and Guo, Shoutao and Zhang, Shaolei and Feng, Yang}, booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics}, year={2025} } @inproceedings{fang2025llamaomni, title={LLaMA-Omni: Seamless Speech Interaction with Large Language Models}, author={Fang, Qingkai and Guo, Shoutao and Zhou, Yan and Ma, Zhengrui and Zhang, Shaolei and Feng, Yang}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2025} } ```
提供机构:
ICTNLP
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作