HelpSteer2
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/nvidia/Llama3-70B-SteerLM-RM
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了大约1万个由人类编写的提示,每个提示都配有一对从10个不同的大型语言模型中抽取的回应。这些回应的质量基于人类标注的五点李克特量表进行评分。值得注意的是,这些评分是基于人类标注而非外部模型的评估。该数据集的规模大约为1万个提示,其任务是评估基于人类标注的回应质量。
This dataset includes approximately 10,000 human-written prompts, each paired with a pair of responses sampled from 10 distinct large language models. The quality of these responses is rated by human annotators using a 5-point Likert scale. Notably, these ratings are based solely on human annotations rather than evaluations performed by external AI models. Designed for the task of evaluating the quality of large language model responses, this dataset relies on human-provided labels as the basis for its scoring framework.



