Nehdi/TuniziBigBench
收藏Hugging Face2024-11-18 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Nehdi/TuniziBigBench
下载链接
链接失效反馈官方服务:
资源简介:
TuniziBigBench数据集是通过抓取超过14,000个突尼斯YouTube视频创建的,提供了丰富的突尼斯语言数据资源。它涵盖了广泛的主题和文本类型,包括政治、新闻、足球等。该数据集特别适用于训练和微调针对突尼斯阿拉伯语和其他地方方言的自然语言处理模型。数据集的结构包含文本内容字段,并且遵循CreativeML Open RAIL-M许可证。
The TuniziBigBench dataset was created by scraping over 14,000 Tunisian YouTube videos, providing a rich repository of Tunisian language data. It covers a wide range of topics and text types, including politics, news, football, and more. The dataset is particularly valuable for training and fine-tuning natural language processing models specific to Tunisian Arabic and other local dialects. The dataset is structured with fields for text content and is released under the CreativeML Open RAIL-M license.
提供机构:
Nehdi



