hikitoxin/chai-ppo-rm-implicit
收藏Hugging Face2024-11-29 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/hikitoxin/chai-ppo-rm-implicit
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含三个特征:chosen(选择的文本)、rejected(被拒绝的文本)和margin(两者之间的差异值)。数据集分为训练集和测试集,训练集包含162819个样本,测试集包含1024个样本。数据集的下载大小为314960195字节,总大小为566584477.0字节。该数据集用于训练RP奖励模型,并且更好的格式化有助于训练更快收敛并达到更高的最终准确率。
The dataset contains three features: chosen (selected text), rejected (rejected text), and margin (the difference value between them). The dataset is divided into a training set and a test set, with the training set containing 162819 samples and the test set containing 1024 samples. The download size of the dataset is 314960195 bytes, and the total size is 566584477.0 bytes. The dataset is used for training RP reward models, and better formatting tends to make training converge faster and achieve higher final accuracy.
提供机构:
hikitoxin



