Amharic News Text classification Dataset
收藏arXiv2021-03-11 更新2024-06-21 收录
下载链接:
https://github.com/user/repo
下载链接
链接失效反馈官方服务:
资源简介:
Amharic News Text classification Dataset是由亚的斯亚贝巴科学与技术大学创建的一个包含超过50,000篇新闻文章的数据集,这些文章被分为6个类别。数据集内容丰富,包括本地和国际新闻,涵盖政治、体育、商业等多个领域。创建过程中,研究者从多个新闻网站收集数据,并进行了手动验证和噪音移除。该数据集主要用于解决Amharic语言在自然语言处理中的文本分类问题,为低资源语言的NLP研究提供了宝贵的资源。
Amharic News Text classification Dataset was developed by Addis Ababa Science and Technology University, which contains over 50,000 news articles categorized into 6 categories. The dataset has rich content covering both local and international news across multiple domains such as politics, sports, business and more. During its creation, researchers collected data from multiple news websites, and performed manual verification and noise removal. This dataset is primarily used to address text classification tasks for the Amharic language in natural language processing, providing a valuable resource for NLP research on low-resource languages.
提供机构:
亚的斯亚贝巴科学与技术大学
创建时间:
2021-03-11



