akadriu/lajme-shqip
收藏Hugging Face2026-04-26 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/akadriu/lajme-shqip
下载链接
链接失效反馈官方服务:
资源简介:
一个大规模的阿尔巴尼亚语新闻语料库,包含1,095,000篇文章(约868 MB),这些文章是在近一年的时间里从阿尔巴尼亚、科索沃和北马其顿的新闻门户网站爬取而来。该数据集旨在通过提供大量的文本语料库来训练和评估语言模型,以解决阿尔巴尼亚语NLP资源稀缺的问题。阿尔巴尼亚语是一种低资源语言,公开可用的NLP研究数据集有限。该语料库通过提供来自多个新闻源和类别的多样化真实世界文本数据,帮助弥补这一差距。
A large-scale Albanian-language news corpus containing 1,095,000 articles (~868 MB) scraped from news portals across Albania, Kosovo, and North Macedonia over a period of nearly one year. This dataset aims to address the scarcity of Albanian NLP resources by providing a substantial text corpus for training and evaluating language models. Albanian is a low-resource language with limited publicly available datasets for NLP research. This corpus helps bridge that gap by offering diverse, real-world text data from multiple news sources and categories.
提供机构:
akadriu



