allenai/tulu-3-sft-olmo-2-mixture
收藏Hugging Face2024-12-02 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/allenai/tulu-3-sft-olmo-2-mixture
下载链接
链接失效反馈官方服务:
资源简介:
OLMo v2 SFT mixture数据集是一个用于训练OLMo模型的多语言数据集,包含939,344个样本。这些样本来自多个不同的子集,每个子集都有不同的许可证。数据集的结构包括每个样本的唯一标识符、用于监督微调的消息格式以及样本的来源数据集。数据集支持多种语言,并且是多语言的。数据集的许可证为ODC-BY-1.0,适用于研究和教育用途。
The OLMo v2 SFT mixture dataset is a multilingual dataset used to train OLMo models, containing 939,344 samples. These samples are sourced from multiple subsets, each with different licenses. The dataset structure includes a unique identifier for each sample, a message format used for supervised fine-tuning, and the source dataset for the sample. The dataset supports multiple languages and is multilingual. The dataset is licensed under ODC-BY-1.0 and is intended for research and educational use.
提供机构:
allenai



