mlx-community/Human-Like-DPO
收藏Hugging Face2025-05-27 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/mlx-community/Human-Like-DPO
下载链接
链接失效反馈官方服务:
资源简介:
这是一个用于Direct Preference Optimization (DPO)训练的测试数据集,包含1000个示例,分为训练集、验证集和测试集。每个示例包括一个输入文本或问题,以及两个模型生成的响应,一个是首选的,另一个是不太首选的。
This is a test dataset for Direct Preference Optimization (DPO) training, containing 1,000 examples divided into training, validation, and test sets. Each example includes an input text or question, and two model-generated responses, one preferred and the other less preferred.
提供机构:
mlx-community



