AlbNews
收藏arXiv2024-02-06 更新2024-06-21 收录
下载链接:
http://hdl.handle.net/11234/1-5411
下载链接
链接失效反馈官方服务:
资源简介:
AlbNews是由维也纳大学创建的阿尔巴尼亚语新闻标题数据集,包含600个带标签和2600个未标签的新闻标题。数据集通过爬取在线新闻门户网站的文章标题创建,标签包括政治、文化、经济和体育。创建过程涉及数据清洗和人工标注,旨在推动阿尔巴尼亚语文本的主题建模和分类研究。该数据集特别适用于解决低资源语言在自然语言处理领域的研究挑战。
AlbNews is an Albanian-language news headline dataset created by the University of Vienna. It contains 600 labeled and 2,600 unlabeled news headlines. The dataset was constructed by scraping article headlines from online news portals, with annotation labels covering four categories: politics, culture, economy, and sports. Its development involved data cleaning and manual annotation, with the aim of advancing research on topic modeling and classification for Albanian-language texts. This dataset is particularly well-suited to addressing the research challenges faced by low-resource languages in the field of natural language processing.
提供机构:
维也纳大学
创建时间:
2024-02-06



