akhooli/fineweb2_ar_300k
收藏Hugging Face2025-01-06 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/akhooli/fineweb2_ar_300k
下载链接
链接失效反馈官方服务:
资源简介:
这是一个阿拉伯语数据集,名为FineWeb2,包含了经过语言分数过滤(分数大于0.9)的前5M内容。数据集包含文本内容、相关链接、文本分数等信息,适用于训练自然语言处理模型。数据集分为训练集,共有322,895个示例。
This is an Arabic dataset named FineWeb2, which includes the first 5M content filtered by language score (score over 0.9). The dataset contains text content, related URLs, text scores, etc., suitable for training natural language processing models. The dataset is split into a training set with a total of 322,895 examples.
提供机构:
akhooli



