chatml-OpenHermes2.5-dpo-binarized-alpha
收藏魔搭社区2025-11-12 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/mlabonne/chatml-OpenHermes2.5-dpo-binarized-alpha
下载链接
链接失效反馈官方服务:
资源简介:
# chatml-OpenHermes2.5-dpo-binarized-alpha
This is a DPO dataset based on [argilla/OpenHermes2.5-dpo-binarized-alpha](https://huggingface.co/datasets/argilla/OpenHermes2.5-dpo-binarized-alpha). It implements the following features:
* **Intel format**: you can directly use this dataset in Axolotl with "type: chatml.intel"
* **Filter out low scores**: removed samples with delta scores < 1 (530 in the training set, 66 in the test set).
* **Curriculum learning**: sort the dataset by the 'delta_score' column in descending order.
## 💻 Code
Code to reproduce this dataset:
```python
!pip install -qqq datasets
from transformers import AutoTokenizer
from datasets import load_dataset
import pandas as pd
# Load the dataset
dataset = load_dataset('argilla/OpenHermes2.5-dpo-binarized-alpha')
def chatml_format(example):
# Format instruction
message = {"role": "user", "content": example['input']}
input = tokenizer.apply_chat_template([message], tokenize=False, add_generation_prompt=False)
# Format chosen answer
chosen = tokenizer.apply_chat_template(example['chosen'], tokenize=False, add_generation_prompt=False)[len(input) + len("<|im_start|>assistant "):-len("<|im_end|> ")]
# Format rejected answer
rejected = tokenizer.apply_chat_template(example['rejected'], tokenize=False, add_generation_prompt=False)[len(input) + len("<|im_start|>assistant "):-len("<|im_end|> ")]
# Calculate score difference
delta_score = abs(example['rating'][0] - example['rating'][1])
return {
"question": example["input"],
"chosen": chosen,
"rejected": rejected,
"delta_score": delta_score,
}
# Load tokenizer (chatml format)
model_name = "mlabonne/NeuralHermes-2.5-Mistral-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Format dataset
dataset_chatml = dataset.map(
chatml_format,
remove_columns=['input', 'conversations', 'generation_model',
'generation_prompt', 'raw_generation_responses', 'generations',
'views', 'system_prompt', 'model_name', 'language', 'id', 'hash',
'model', 'avatarUrl', 'custom_instruction', 'topic', 'title',
'idx', 'rejected_score', 'chosen_score', 'source',
'skip_prompt_formatting', 'category', 'rating', 'chosen_model',
'rejected_model']
)
# Remove low delta scores
dataset_chatml = dataset_chatml.filter(lambda x: x["delta_score"] > 1.0)
# Sort the dataset by the 'delta_score' column in descending order
dataset_chatml = dataset_chatml.sort('delta_score', reverse=True)
pd.DataFrame(dataset_chatml['train']).iloc[10]
```
# chatml-OpenHermes2.5-dpo-binarized-alpha
本数据集为基于[argilla/OpenHermes2.5-dpo-binarized-alpha](https://huggingface.co/datasets/argilla/OpenHermes2.5-dpo-binarized-alpha)构建的直接偏好优化(DPO, Direct Preference Optimization)数据集,具有如下特性:
* **英特尔适配格式**:可直接在Axolotl框架中以`"type: chatml.intel"`的配置加载本数据集。
* **低差值评分过滤**:移除了`delta_score`(评分差值)小于1的样本,其中训练集共移除530条,测试集移除66条。
* **课程学习适配**:按照`delta_score`列的降序规则对数据集进行重新排序。
## 💻 复现代码
python
# 安装依赖库
!pip install -qqq datasets
# 导入依赖模块
from transformers import AutoTokenizer
from datasets import load_dataset
import pandas as pd
# 加载原始数据集
dataset = load_dataset('argilla/OpenHermes2.5-dpo-binarized-alpha')
# 定义ChatML格式转换函数
def chatml_format(example):
# 格式化用户指令
message = {"role": "user", "content": example['input']}
input = tokenizer.apply_chat_template([message], tokenize=False, add_generation_prompt=False)
# 格式化优选回复
chosen = tokenizer.apply_chat_template(example['chosen'], tokenize=False, add_generation_prompt=False)[len(input) + len("<|im_start|>assistant "):-len("<|im_end|> ")]
# 格式化非优选回复
rejected = tokenizer.apply_chat_template(example['rejected'], tokenize=False, add_generation_prompt=False)[len(input) + len("<|im_start|>assistant "):-len("<|im_end|> ")]
# 计算评分差值
delta_score = abs(example['rating'][0] - example['rating'][1])
return {
"question": example["input"],
"chosen": chosen,
"rejected": rejected,
"delta_score": delta_score,
}
# 加载ChatML格式分词器
model_name = "mlabonne/NeuralHermes-2.5-Mistral-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
# 对数据集进行格式转换
dataset_chatml = dataset.map(
chatml_format,
remove_columns=['input', 'conversations', 'generation_model',
'generation_prompt', 'raw_generation_responses', 'generations',
'views', 'system_prompt', 'model_name', 'language', 'id', 'hash',
'model', 'avatarUrl', 'custom_instruction', 'topic', 'title',
'idx', 'rejected_score', 'chosen_score', 'source',
'skip_prompt_formatting', 'category', 'rating', 'chosen_model',
'rejected_model']
)
# 过滤低评分差值样本
dataset_chatml = dataset_chatml.filter(lambda x: x["delta_score"] > 1.0)
# 按照delta_score列降序排序数据集
dataset_chatml = dataset_chatml.sort('delta_score', reverse=True)
# 查看训练集第11条样本(索引从0开始)
pd.DataFrame(dataset_chatml['train']).iloc[10]
提供机构:
maas
创建时间:
2025-03-18



