adlbh/medinstruct-52k-arabic
收藏Hugging Face2024-05-27 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/adlbh/medinstruct-52k-arabic
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: origin_index
dtype: int64
splits:
- name: train
num_bytes: 100597846
num_examples: 52002
download_size: 46888896
dataset_size: 100597846
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Dataset Description
This dataset is a translated version of MedInstruct-52k. It contains 52k instruction-input-output triples in Arabic.
The translations were obtained using the Google Translate API.
# Tasks
The MedInstruct-52k-arabic dataset, designed for instruction fine-tuning pretrained language models on Arabic Biomedical text generation.
# Description of the Original Data
The original [MedInstruct-52k](https://github.com/XZhang97666/AlpaCare/tree/master/data) was created by [X. Zhang et al](https://arxiv.org/abs/2310.14558) following the same construction process as the Alpaca instructions dataset.
提供机构:
adlbh
原始信息汇总
数据集概述
数据集信息
- 名称: MedInstruct-52k-arabic
- 内容: 包含52,000个指令-输入-输出三元组,内容为阿拉伯语。
- 来源: 翻译自原始的MedInstruct-52k数据集,翻译服务使用Google Translate API。
数据集特征
- 特征名称: instruction, input, output, origin_index
- 数据类型:
- instruction: string
- input: string
- output: string
- origin_index: int64
数据集划分
- 训练集:
- 文件大小: 100,597,846字节
- 示例数量: 52,002个
数据集大小
- 下载大小: 46,888,896字节
- 数据集总大小: 100,597,846字节
配置
- 默认配置:
- 数据文件路径: data/train-*
- 划分: 训练集



