selfcorrexp2/balanced_self_rewarding_rm_labeled_llama3_sft_gen_1round
收藏Hugging Face2025-01-23 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/selfcorrexp2/balanced_self_rewarding_rm_labeled_llama3_sft_gen_1round
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个包含多个字段的结构化数据集,主要用于训练机器学习模型。字段包括索引、提示文本、奖励布尔序列、答案字符串序列、真实标签字符串、代理标签布尔值、二次奖励布尔序列和预测奖励浮点数序列。数据集划分为训练集,包含15000个示例。
The dataset is a structured dataset containing multiple fields, primarily used for training machine learning models. Fields include index, prompt text, reward boolean sequence, answer string sequence, ground truth label string, proxy label boolean, second reward boolean sequence, and predicted reward float sequence. The dataset is split into a training set, containing 15,000 examples.
提供机构:
selfcorrexp2



