teamcore/DPO_Pm3B_RMAB_TG_clean_beta0.25sigmoidEurus_RM_7b
收藏Hugging Face2025-11-08 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/teamcore/DPO_Pm3B_RMAB_TG_clean_beta0.25sigmoidEurus_RM_7b
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了一系列关于任务选择的信息,每个样本包含了任务类型、选择项、拒绝项、选择分布、拒绝分布、决策概率、原始决策、任务分类、是否为原始任务、提示信息、固定提示、固定选择项、固定拒绝项和响应等字段。数据集分为默认分片,包含2000个示例,总大小为24650100字节。
The dataset consists of a series of task selection information, with each sample including fields such as task type, chosen item, rejected item, chosen distribution, rejected distribution, decision probability, raw decision, task category, whether it is an original task, prompt information, fixed prompt, fixed chosen item, fixed rejected item, and response. The dataset is split into a default shard containing 2000 examples, with a total size of 24650100 bytes.
提供机构:
teamcore



