allenai/Dolci-Think-RL-7B-Completions-DPO
收藏Hugging Face2025-12-12 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/allenai/Dolci-Think-RL-7B-Completions-DPO
下载链接
链接失效反馈官方服务:
资源简介:
Dolci-Think-Completions-DPO是一个包含4,345,797个补全的数据集,这些补全来自Olmo-3-7B-Think-DPO模型,用于生成Dolci-Think-RL的提示。数据集包含556,095个高质量提示,覆盖数学、代码、精确指令遵循、一般聊天和谜题等多个领域。每个领域有对应的分割,数据来源包括多个公开的数据集和论文,如IF Multi-Constraint、OMEGA Math、AceCoder等。数据集经过了关键词和主题过滤、执行基于测试案例的验证、F1分数过滤等多种处理步骤。
Dolci-Think-Completions-DPO is a set of 4,345,797 completions from the Olmo-3-7B-Think-DPO model over the prompts considered when making Dolci-Think-RL. It contains 556,095 high-quality prompts covering Math, Code, Precise Instruction Following, General Chat, and Puzzles. Each split covers one of the above domains, and the original_dataset column contains the source dataset. The data sources include multiple public datasets and papers such as IF Multi-Constraint, OMEGA Math, AceCoder, etc. The dataset has undergone various processing steps including keyword & topic filtering, execution-based test-case validation, F1-score filtering, etc.
提供机构:
allenai



