five

Amharic instruction fine-tuning dataset

收藏
arXiv2024-04-29 更新2024-06-21 收录
下载链接:
https://huggingface.co/EthioNLP
下载链接
链接失效反馈
官方服务:
资源简介:
本研究专注于增强Amharic-LLaMA模型,通过整合特定任务和生成数据集来提升Amharic语言模型的性能。数据集名为Amharic instruction fine-tuning dataset,由Masakhane NLP和Ethio NLP机构创建,包含122,637条数据,主要用于Amharic语言的指令微调。数据集内容涵盖多种NLP任务,如情感分析和机器翻译,通过转换现有数据集为指令格式来创建。创建过程涉及数据收集、转换和验证,旨在解决低资源语言模型适应性的问题,特别是在对话交互中的应用。

This study focuses on enhancing the Amharic-LLaMA model by integrating task-specific and generative datasets to boost the performance of Amharic language models. The dataset, named Amharic Instruction Fine-tuning Dataset, was developed by Masakhane NLP and Ethio NLP, consisting of 122,637 data instances and primarily intended for instruction fine-tuning of Amharic language models. It covers a variety of natural language processing (NLP) tasks including sentiment analysis and machine translation, and was created by converting existing datasets into instruction-tuning formats. Its creation process involves data collection, conversion and validation, aiming to address the adaptation challenges of low-resource language models, particularly for applications in conversational interactions.
提供机构:
马萨卡内NLP,埃塞俄比亚NLP
创建时间:
2024-02-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作