JQL-AI/fw2_edu_scores
收藏Hugging Face2025-08-07 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/JQL-AI/fw2_edu_scores
下载链接
链接失效反馈官方服务:
资源简介:
FineWeb2-JQL-Education 是一个包含 36 种语言的模型注释语言子集,使用模型注释进行过滤,以实现更高质量的训练结果。该数据集基于深度学习分类器分配的分数,使用 Snowflakes Arctic-embed-m-v2.0 嵌入进行文档评分。数据集包含来自不同分类器的质量分数和 Snowflakes Arctic-embed-m-v2.0 嵌入。数据集源自 2013 年至 2024 年的网页内容,可能包含个人身份信息。它还提供了关于数据集的社会影响、潜在偏见和局限性的信息。
FineWeb2-JQL-Education is a model-annotated language subset of FineWeb2, spanning 36 languages. It utilizes model-based filtering for higher-quality training outcomes. The dataset is based on scores assigned by a deep learning classifier trained to identify educational samples using Snowflakes Arctic-embed-m-v2.0 embeddings. It includes quality scores obtained from different classifiers and embeddings from Snowflakes Arctic-embed-m-v2.0. The dataset is derived from web content collected from 2013 to 2024 and may contain PII. It also provides information on the social impact, potential biases, and limitations of the dataset.
提供机构:
JQL-AI



