ultrafeedback-binarized-preferences-cleaned
收藏魔搭社区2025-10-09 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/mlabonne/ultrafeedback-binarized-preferences-cleaned
下载链接
链接失效反馈官方服务:
资源简介:
# ultrafeedback-binarized-preferences-cleaned
This is a DPO dataset based on [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned). It implements the following features:
* **Intel format**: you can directly use this dataset in Axolotl with "type: chatml.intel"
* **Filter out low scores**: removed samples with delta scores assistant "):-len(" ")]
# Format rejected answer
rejected = tokenizer.apply_chat_template(example['rejected'], tokenize=False, add_generation_prompt=False)[len(input) + len("assistant "):-len(" ")]
# Calculate score difference
delta_score = abs(example['chosen-rating'] - example['rejected-rating'])
return {
"question": example['prompt'],
"chosen": chosen,
"rejected": rejected,
"delta_score": delta_score,
}
# Load tokenizer (chatml format)
model_name = "mlabonne/NeuralHermes-2.5-Mistral-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Format dataset
dataset_chatml = dataset.map(
chatml_format,
remove_columns=['chosen-rating', 'rejected-rating', 'chosen-model', 'rejected-model', 'source', 'prompt'],
)
# Remove low delta scores
dataset_chatml = dataset_chatml.filter(lambda x: x["delta_score"] > 1.0)
# Sort the dataset by the 'delta_score' column in descending order
dataset_chatml = dataset_chatml.sort('delta_score', reverse=True)
pd.DataFrame(dataset_chatml['train']).iloc[10:20]
```
# ultrafeedback-binarized-preferences-cleaned
本数据集为基于[argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned)构建的直接偏好优化(Direct Preference Optimization,DPO)数据集,具备以下特性:
* **Intel格式**:可直接在Axolotl中结合`"type: chatml.intel"`配置使用该数据集
* **过滤低得分差值样本**:移除得分差值过低的样本
# 格式化拒绝候选回答
rejected = tokenizer.apply_chat_template(example['rejected'], tokenize=False, add_generation_prompt=False)[len(input) + len("assistant "):-len(" ")]
# 计算得分差值
delta_score = abs(example['chosen-rating'] - example['rejected-rating'])
return {
"question": example['prompt'],
"chosen": chosen,
"rejected": rejected,
"delta_score": delta_score,
}
# 加载ChatML格式分词器
model_name = "mlabonne/NeuralHermes-2.5-Mistral-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
# 格式化数据集
dataset_chatml = dataset.map(
chatml_format,
remove_columns=['chosen-rating', 'rejected-rating', 'chosen-model', 'rejected-model', 'source', 'prompt'],
)
# 移除低得分差值样本
dataset_chatml = dataset_chatml.filter(lambda x: x["delta_score"] > 1.0)
# 按'delta_score'列降序排序数据集
dataset_chatml = dataset_chatml.sort('delta_score', reverse=True)
pd.DataFrame(dataset_chatml['train']).iloc[10:20]
提供机构:
maas
创建时间:
2025-03-18



