Amharic instruction fine-tuning dataset

Name: Amharic instruction fine-tuning dataset
Creator: 马萨卡内NLP，埃塞俄比亚NLP
Published: 2024-04-29 15:14:51
License: 暂无描述

arXiv2024-04-29 更新2024-06-21 收录

下载链接：

https://huggingface.co/EthioNLP

下载链接

链接失效反馈

官方服务：

资源简介：

本研究专注于增强Amharic-LLaMA模型，通过整合特定任务和生成数据集来提升Amharic语言模型的性能。数据集名为Amharic instruction fine-tuning dataset，由Masakhane NLP和Ethio NLP机构创建，包含122,637条数据，主要用于Amharic语言的指令微调。数据集内容涵盖多种NLP任务，如情感分析和机器翻译，通过转换现有数据集为指令格式来创建。创建过程涉及数据收集、转换和验证，旨在解决低资源语言模型适应性的问题，特别是在对话交互中的应用。

This study focuses on enhancing the Amharic-LLaMA model by integrating task-specific and generative datasets to boost the performance of Amharic language models. The dataset, named Amharic Instruction Fine-tuning Dataset, was developed by Masakhane NLP and Ethio NLP, consisting of 122,637 data instances and primarily intended for instruction fine-tuning of Amharic language models. It covers a variety of natural language processing (NLP) tasks including sentiment analysis and machine translation, and was created by converting existing datasets into instruction-tuning formats. Its creation process involves data collection, conversion and validation, aiming to address the adaptation challenges of low-resource language models, particularly for applications in conversational interactions.

提供机构：

马萨卡内NLP，埃塞俄比亚NLP

创建时间：

2024-02-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集