ClusterlabAi/InstAr-500k

Name: ClusterlabAi/InstAr-500k
Creator: ClusterlabAi
Published: 2024-07-30 16:41:57
License: 暂无描述

Hugging Face2024-07-30 更新2024-07-22 收录

下载链接：

https://hf-mirror.com/datasets/ClusterlabAi/InstAr-500k

下载链接

链接失效反馈

官方服务：

资源简介：

数据集名为InstAr-500k，包含近500,000条阿拉伯语指令和响应，旨在用于微调大型语言模型（LLMs）以执行阿拉伯语自然语言处理任务。数据集结合了合成数据和人工制作的数据，涵盖多个领域和指令类型。数据集的结构包括多个特征，如指令、输出、来源、任务、系统、uuid、主题等。数据集分为训练集，包含481,281个样本，总大小为1,090,145,730字节。数据集支持的任务类别包括问答、摘要和文本分类。数据集的语言为阿拉伯语，规模在100K到1M之间。

The InstAr-500k dataset comprises almost 500,000 Arabic instructions and responses designed for fine-tuning large language models (LLMs) for Arabic NLP tasks. It includes a combination of synthetic and human-crafted data across various domains and instruction types. This extensive dataset aims to improve the performance of LLMs on Arabic-specific tasks. The dataset features include instruction, output, source, task, system, uuid, and topic. It is licensed under Apache-2.0 and is available in the train split with 481,281 examples. The dataset supports tasks such as question-answering, summarization, and text-classification.

提供机构：

ClusterlabAi

5,000+

优质数据集

54 个

任务类型

进入经典数据集