styal/filtered-fineweb-edu
收藏Hugging Face2026-02-16 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/styal/filtered-fineweb-edu
下载链接
链接失效反馈官方服务:
资源简介:
Dataset created by filtering the fineweb-edu-dedup subset of smoll-corpus using this filter.
```
def my_filter(example):
return example["metadata"]["token_count"] <= 3000 and example["metadata"]["score"] * example["metadata"]["language_score"] > 0.95*3.8 and example["metadata"]["language"] == "en"
```
Each file contains approximately 31k rows.
提供机构:
styal



