allenai/Dolci-Think-DPO-32B

Name: allenai/Dolci-Think-DPO-32B
Creator: allenai
Published: 2025-11-20 13:56:40
License: 暂无描述

Hugging Face2025-11-20 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/allenai/Dolci-Think-DPO-32B

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: prompt dtype: string - name: chosen list: - name: content dtype: string - name: role dtype: string - name: rejected list: - name: content dtype: string - name: role dtype: string - name: chosen_model dtype: string - name: rejected_model dtype: string - name: dataset dtype: string - name: prompt_id dtype: string - name: preference_type dtype: string splits: - name: train num_bytes: 4488651554 num_examples: 200000 download_size: 1882695224 dataset_size: 4488651554 configs: - config_name: default data_files: - split: train path: data/train-* license: odc-by --- # Dolci Think DPO Mixture This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). The Dolci Think DPO mixture was used to preference tune Olmo 3 Think 32B. It contains 200,000 preference pairs created with the preference heuristic described in [Delta Learning](https://arxiv.org/abs/2507.06187) (Geng et al. 2025).

提供机构：

allenai

5,000+

优质数据集

54 个

任务类型

进入经典数据集