THUDM/LongReward-10k
收藏Hugging Face2024-10-29 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/THUDM/LongReward-10k
下载链接
链接失效反馈官方服务:
资源简介:
LongReward-10k数据集包含10,000个长上下文问答实例,涵盖英文和中文两种语言,每个实例最长可达64,000个单词。数据集分为三个部分:sft部分包含监督微调数据,dpo_glm4_9b和dpo_llama3.1_8b部分为长上下文偏好数据集,用于训练不同的DPO模型。
The LongReward-10k dataset contains 10,000 long-context QA instances in both English and Chinese, with each instance up to 64,000 words in length. The dataset is divided into three parts: the sft part includes supervised fine-tuning data, while the dpo_glm4_9b and dpo_llama3.1_8b parts are long-context preference datasets used for training different DPO models.
提供机构:
THUDM



