Asap7772/prm800k_onpolicy_multiturn_rtg_prefix0.2_roll4_maxrev100
收藏Hugging Face2024-09-25 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Asap7772/prm800k_onpolicy_multiturn_rtg_prefix0.2_roll4_maxrev100
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个字段,如输入、输出前缀、输出、终端标志、完成标志、轨迹、步骤、标签、奖励、是否选择、rtg、未折扣rtg、文本等。数据集分为训练集和测试集,训练集包含13708504个示例,测试集包含13737759个示例。数据文件路径在配置信息中指定。
The dataset includes multiple fields such as input, output prefix, output, terminal flag, completion flag, trajectory, step, label, reward, is chosen, rtg, undiscounted rtg, text, etc. The dataset is divided into training and test sets, with the training set containing 13,708,504 examples and the test set containing 13,737,759 examples. The data file paths are specified in the configuration information.
提供机构:
Asap7772



