salaheddinealabouch/moroccan_darija_domain_classifier_dataset
收藏Hugging Face2025-02-14 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/salaheddinealabouch/moroccan_darija_domain_classifier_dataset
下载链接
链接失效反馈官方服务:
资源简介:
摩洛哥达尔加语领域分类数据集是一个为摩洛哥达尔加语文本分类设计的合成数据集,使用Gemini-2.0-Flash模型创建,用于微调ModernBERT或类似基于变换器的模型。该数据集包含多个领域的文本样本,适合各种自然语言处理应用。每个数据条目包括一个摩洛哥达尔加语文本样本和一个预定义的领域分类标签。数据集尚未经过验证,可能包含不完全准确的摩洛哥达尔加语样本。
The Moroccan Darija Domain Classifier Dataset is a synthetic dataset designed for text classification in Moroccan Darija. It has been created using the Gemini-2.0-Flash model and is intended for fine-tuning ModernBERT or similar transformer-based models. The dataset includes text samples from various domains and is suitable for a range of NLP applications. Each entry consists of a Moroccan Darija text sample and a label for domain classification. The dataset has not been validated and may contain samples that are not 100% accurate representations of Moroccan Darija.
提供机构:
salaheddinealabouch



