Ichsan2895/DPO_ID-Wiki_10kTesting
收藏数据集处理方法
数据格式转换
python def return_prompt_and_responses(samples) -> dict[str, str, str]: return { "prompt": [ "<|im_start|>user " + i + "<|im_end|> " for i in samples["PROMPT"] ], "chosen": [ "<|im_start|>assistant " + j + "<|im_end|>" for j in samples["CHOSEN"] ], "rejected": [ "<|im_start|>assistant " + k + "<|im_end|>" for k in samples["REJECTED"] ], }
该函数将数据集中的样本转换为特定的格式,包括用户提示、被选中的响应和被拒绝的响应。
数据集加载与映射
python dataset = load_dataset( "Ichsan2895/DPO_ID-Wiki_10kTesting", ) original_columns = dataset.column_names
dataset.map( return_prompt_and_responses, batched=True, remove_columns=original_columns )
这段代码加载了一个名为 "Ichsan2895/DPO_ID-Wiki_10kTesting" 的数据集,并对其进行映射处理,移除原始列,替换为新的格式化列。
DPO 使用方法
python dpo_trainer = DPOTrainer( model, # base model from SFT pipeline model_ref, # typically a copy of the SFT trained base model beta=0.1, # temperature hyperparameter of DPO train_dataset=dataset[train], # dataset prepared above tokenizer=tokenizer, # tokenizer args=training_args, # training arguments e.g. batch size, lr, etc. )
这段代码展示了如何使用 DPOTrainer 进行训练,包括基础模型、参考模型、超参数设置、训练数据集、分词器和训练参数。
引用
@ONLINE{wikidump, author = "Wikimedia Foundation", title = "Wikimedia Downloads", url = "https://dumps.wikimedia.org" }
@misc{vonwerra2022trl, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang}, title = {TRL: Transformer Reinforcement Learning}, year = {2020}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {url{https://github.com/huggingface/trl}} }
这些引用信息提供了数据集来源和相关工具的参考文献。



