TurkuNLP/finerweb-10bt
收藏Hugging Face2025-01-17 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/TurkuNLP/finerweb-10bt
下载链接
链接失效反馈官方服务:
资源简介:
FinerWeb-10BT数据集是在FineWeb-10BT样本的基础上,增加了每行文本质量评分的扩展数据集。它由Erik Henriksson、Otto Tarkka和Filip Ginter整理,使用LLM基础的质量过滤管道为每个文本行提供了0.0到1.0之间的质量评分,评分越高代表内容质量越高。
The FinerWeb-10BT dataset is an extension of the FineWeb-10BT sample, enhanced with quality scores for each line of text. Curated by Erik Henriksson, Otto Tarkka, and Filip Ginter, it provides quality scores ranging from 0.0 to 1.0 for each text line, with higher scores indicating better content quality.
提供机构:
TurkuNLP



