five

Ichsan2895/DPO_ID-Wiki_10kTesting

收藏
Hugging Face2023-11-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Ichsan2895/DPO_ID-Wiki_10kTesting
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-sa-4.0 --- ## HOW TO WRANGLING THIS DATASET TO DPO & CHATML FORMAT ``` def return_prompt_and_responses(samples) -> dict[str, str, str]: return { "prompt": [ "<|im_start|>user\n" + i + "<|im_end|>\n" for i in samples["PROMPT"] ], "chosen": [ "<|im_start|>assistant\n" + j + "<|im_end|>" for j in samples["CHOSEN"] ], "rejected": [ "<|im_start|>assistant\n" + k + "<|im_end|>" for k in samples["REJECTED"] ], } dataset = load_dataset( "Ichsan2895/DPO_ID-Wiki_10kTesting", ) original_columns = dataset.column_names dataset.map( return_prompt_and_responses, batched=True, remove_columns=original_columns ) ``` ## HOW TO USE DPO ``` dpo_trainer = DPOTrainer( model, # base model from SFT pipeline model_ref, # typically a copy of the SFT trained base model beta=0.1, # temperature hyperparameter of DPO train_dataset=dataset['train'], # dataset prepared above tokenizer=tokenizer, # tokenizer args=training_args, # training arguments e.g. batch size, lr, etc. ) ``` ## CITATION ``` @ONLINE{wikidump, author = "Wikimedia Foundation", title = "Wikimedia Downloads", url = "https://dumps.wikimedia.org" } @misc{vonwerra2022trl, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang}, title = {TRL: Transformer Reinforcement Learning}, year = {2020}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/huggingface/trl}} } ```
提供机构:
Ichsan2895
原始信息汇总

数据集处理方法

数据格式转换

python def return_prompt_and_responses(samples) -> dict[str, str, str]: return { "prompt": [ "<|im_start|>user " + i + "<|im_end|> " for i in samples["PROMPT"] ], "chosen": [ "<|im_start|>assistant " + j + "<|im_end|>" for j in samples["CHOSEN"] ], "rejected": [ "<|im_start|>assistant " + k + "<|im_end|>" for k in samples["REJECTED"] ], }

该函数将数据集中的样本转换为特定的格式,包括用户提示、被选中的响应和被拒绝的响应。

数据集加载与映射

python dataset = load_dataset( "Ichsan2895/DPO_ID-Wiki_10kTesting", ) original_columns = dataset.column_names

dataset.map( return_prompt_and_responses, batched=True, remove_columns=original_columns )

这段代码加载了一个名为 "Ichsan2895/DPO_ID-Wiki_10kTesting" 的数据集,并对其进行映射处理,移除原始列,替换为新的格式化列。

DPO 使用方法

python dpo_trainer = DPOTrainer( model, # base model from SFT pipeline model_ref, # typically a copy of the SFT trained base model beta=0.1, # temperature hyperparameter of DPO train_dataset=dataset[train], # dataset prepared above tokenizer=tokenizer, # tokenizer args=training_args, # training arguments e.g. batch size, lr, etc. )

这段代码展示了如何使用 DPOTrainer 进行训练,包括基础模型、参考模型、超参数设置、训练数据集、分词器和训练参数。

引用

@ONLINE{wikidump, author = "Wikimedia Foundation", title = "Wikimedia Downloads", url = "https://dumps.wikimedia.org" }

@misc{vonwerra2022trl, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang}, title = {TRL: Transformer Reinforcement Learning}, year = {2020}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {url{https://github.com/huggingface/trl}} }

这些引用信息提供了数据集来源和相关工具的参考文献。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作