five

mohammadkrb/fineweb2-HQ-persian

收藏
Hugging Face2025-11-14 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/mohammadkrb/fineweb2-HQ-persian
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: odc-by language: - fa --- # Persian Subset of FineWeb2-HQ Dataset This dataset is a derived subset of the **[FineWeb dataset](https://huggingface.co/datasets/epfml/FineWeb2-HQ)**, filtered to include **only Persian texts** and **texts shorter than 5000 characters**. ## Dataset Details * **Original dataset:** FineWeb * **Filtered fields:** Only the `text` column is retained * **Filter applied:** `len(text) < 5000` * **Purpose:** Reduce dataset size for resource-limited environments and focus on Persian content **Attribution:** This is a filtered subset of FineWeb. Original dataset credit goes to the FineWeb creators.
提供机构:
mohammadkrb
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作