amanuelbyte/Amharic_dataset
收藏Hugging Face2025-07-25 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/amanuelbyte/Amharic_dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含使用阿姆哈拉语撰写的新闻文章。文章主要覆盖2020年12月至2021年1月的时间段(埃塞俄比亚历:塔希萨斯2013年至提尔2013年)。数据以文本行形式存在于`.txt`文本语料库中。数据集可以用于预训练阿姆哈拉语语言模型、文本摘要、自然语言处理研究、历史和社会科学研究以及机器翻译等多种用途。然而,数据集以原始文本格式提供,可能需要清洗格式问题;未提供明确的许可信息,用户在使用数据前需验证版权和 usage guidelines;此外,数据集规模较小,可能不足以训练大型机器学习模型。
The dataset consists of news articles written in Amharic. The articles primarily cover the period from December 2020 to January 2021 (Ethiopian Calendar: Tahisas 2013 to Tir 2013). The data is provided in the form of text lines within a `.txt` text corpus. The dataset can be used for various purposes such as pretraining Amharic language models, text summarization, NLP research, historical and social science research, and machine translation. However, the dataset is provided in raw text format and may require cleaning for formatting issues; no explicit licensing information is provided, and users should verify copyright and usage guidelines before using the data; also, the dataset size is small and may not be sufficient for training large machine learning models.
提供机构:
amanuelbyte



