atlasia/FineWeb2-Moroccan-Arabic-Predictions-0.6
收藏Hugging Face2025-01-03 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/atlasia/FineWeb2-Moroccan-Arabic-Predictions-0.6
下载链接
链接失效反馈官方服务:
资源简介:
这个数据集包含从FineWeb2数据集中提取的摩洛哥达尔语(Moroccan Darija)样本。这些样本是使用一个专门用于识别阿拉伯方言的自定义模型进行筛选的,只保留了模型对摩洛哥达尔语有高置信度(超过0.6)的样本。该数据集旨在推进摩洛哥达尔语自然语言处理任务的研究与开发。
This dataset contains Moroccan Darija samples extracted from the FineWeb2 dataset using a custom model trained to identify Arabic dialects. Only samples with a high confidence score (above 0.6) for Moroccan Darija are retained. The dataset aims to advance research and development in Moroccan Darija NLP tasks.
提供机构:
atlasia



