opencsg/smoltalk-chinese
收藏Hugging Face2025-12-05 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/opencsg/smoltalk-chinese
下载链接
链接失效反馈官方服务:
资源简介:
Chinese SmolTalk数据集是一个中文微调数据集,旨在为大型语言模型(LLM)的训练提供高质量的合成数据支持。该数据集由OpenCSG社区构建,全部由合成数据组成,涵盖超过70万条数据,专门设计用于提升中文大型语言模型在多种任务上的表现,增强模型的多功能性和适应性。数据集由多个部分组成,覆盖广泛的任务类型,包括信息检索、推理、规划、编辑、编程、数学、角色扮演、数据分析、创意写作、寻求建议和头脑风暴等。数据集的构建过程严格遵循高标准,确保数据的质量和多样性。
The Chinese SmolTalk dataset is a Chinese fine-tuning dataset constructed with reference to the SmolTalk dataset. It aims to provide high-quality synthetic data support for training large language models (LLMs). The dataset consists entirely of synthetic data, comprising over 700,000 entries. It is specifically designed to enhance the performance of Chinese LLMs across various tasks, improving their versatility and adaptability.
提供机构:
opencsg



