opendatalab/SlimPajama-Meta-rater
收藏Hugging Face2025-06-14 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/opendatalab/SlimPajama-Meta-rater
下载链接
链接失效反馈官方服务:
资源简介:
这个数据集包含了大约5800亿个标记,并跨越了25个不同的质量维度。数据集旨在为数据为中心的大型语言模型研究提供支持,并包括自然语言质量信号、数据重要性分数和基于模型的质量评级等指标。数据集被划分为不同的领域,并完全注释,为研究人员和开发人员提供了一个全面的资源。
This dataset contains approximately 580 billion tokens annotated across 25 different quality dimensions. It is designed for data-centric large language model research and includes metrics such as natural language quality signals, data importance scores, and model-based quality ratings. The dataset is split into different domains and is fully annotated, providing a comprehensive resource for researchers and developers.
提供机构:
opendatalab



