survivi/Llama-3-SynE-Dataset
收藏Hugging Face2024-08-12 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/survivi/Llama-3-SynE-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
Llama-3-SynE持续预训练数据集是一个用于增强Llama-3(8B)模型中文语言能力和科学推理能力的高质量数据集。该数据集通过精心设计的数据混合和课程策略,帮助模型在保持原有性能的同时,显著提升中文处理和科学推理能力。数据集包含多种主题和学科的数据,旨在通过持续预训练提升模型的多学科科学知识和中文处理能力。
The Llama-3-SynE continual pre-training dataset is a high-quality dataset designed to enhance the Chinese language ability and scientific reasoning capability of the Llama-3 (8B) model. Through a meticulously designed data mixture and curriculum strategy, this dataset helps the model significantly improve its Chinese processing and scientific reasoning capabilities while maintaining its original performance. The dataset includes data from various topics and disciplines, aiming to enhance the models multidisciplinary scientific knowledge and Chinese language processing through continual pre-training.
提供机构:
survivi



