five

sumuks/helpsteer3-dpo-style

收藏
Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sumuks/helpsteer3-dpo-style
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: HelpSteer3 DPO Style language: - en - zh - ko - fr - es - ru - ja - de - it - pt - pl - id - nl - vi license: cc-by-4.0 task_categories: - text-generation tags: - dpo - preference-optimization - human-feedback - reward-modeling - helpsteer - nvidia size_categories: - 10K<n<100K configs: - config_name: default data_files: - split: train path: train-00000-of-00001.parquet - split: test path: test-00000-of-00001.parquet --- # Dataset Card for HelpSteer3 DPO Style ## Dataset Summary This dataset is derived from `nvidia/HelpSteer3` using the `preference` config. Each source row contains a chat-style `context`, two candidate responses, and a signed `overall_preference` score. This conversion keeps only strong preferences with `abs(overall_preference) >= 2` and maps them into DPO-style `chosen` and `rejected` rows. The original HelpSteer3 `validation` split is written as `test` here to match the train/test convention used elsewhere in this repo. ## Dataset Structure - Train source rows: 38459 - Test source rows: 2017 - Train DPO rows: 23959 - Test DPO rows: 1288 - Total DPO rows: 25247 - Dropped weak-preference rows: 15229 Each row contains these key fields: - `prompt`: Rendered conversation transcript from the source `context`. - `chosen`: Preferred assistant response chosen from `response1` or `response2`. - `rejected`: Less-preferred assistant response chosen from the other source response. - `difficulty`: `1 / abs(overall_preference)`, so `0.5` for `±2` and `0.333...` for `±3`. ## Construction Notes - Negative `overall_preference` values mean `response1` is preferred. - Positive `overall_preference` values mean `response2` is preferred. - Rows with scores `-1`, `0`, and `1` are dropped as too weak for this dataset. - Difficulty uses the absolute preference value because the sign only indicates which side won, not how hard the pair is. - No prompt-level regrouping is needed because HelpSteer3 already ships train and validation splits.
提供机构:
sumuks
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作