root-signals/helpsteer2-binarized-granular-full
收藏Hugging Face2025-03-11 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/root-signals/helpsteer2-binarized-granular-full
下载链接
链接失效反馈官方服务:
资源简介:
这是一个由NVIDIA创建的帮助引导数据集,包含通过Llama3标记器进行长度排序和二进制化的训练分割,分为单轮对话和多云对话子部分。500令牌分割包含500-1000令牌之间的选定响应,1000令牌分割包含1000+令牌的响应。如果一个例子至少包含一对“用户”和“助手”以及主要响应,则将其归类为多云对话。如果您不关心对话轮次,还有一个“combined”分割,它包含所有内容,但是请注意,不同分割之间的id是不相同的,合并时不会生效。
This is a Helpsteer dataset created by NVIDIA, containing training splits that are binarized and sorted by length using the Llama3 tokenizer, categorized into single-turn and multi-turn subparts. The 500 token splits contain chosen responses between 500-1000 tokens, while the 1000 token split contains responses of 1000+ tokens. An example is categorized as multi-turn if it has at least one pair of User and Assistant interactions in addition to the main response. If you are indifferent about the number of turns, there is a combined split which includes everything, but note that the ids are not the same between the splits and merging will not work.
提供机构:
root-signals



