mohammadkrb/fineweb2-HQ-persian
收藏Hugging Face2025-11-14 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/mohammadkrb/fineweb2-HQ-persian
下载链接
链接失效反馈官方服务:
资源简介:
---
license: odc-by
language:
- fa
---
# Persian Subset of FineWeb2-HQ Dataset
This dataset is a derived subset of the **[FineWeb dataset](https://huggingface.co/datasets/epfml/FineWeb2-HQ)**, filtered to include **only Persian texts** and **texts shorter than 5000 characters**.
## Dataset Details
* **Original dataset:** FineWeb
* **Filtered fields:** Only the `text` column is retained
* **Filter applied:** `len(text) < 5000`
* **Purpose:** Reduce dataset size for resource-limited environments and focus on Persian content
**Attribution:**
This is a filtered subset of FineWeb. Original dataset credit goes to the FineWeb creators.
提供机构:
mohammadkrb



