JosefAlbers/akemiH_MedQA_Reason
收藏Hugging Face2024-06-03 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/JosefAlbers/akemiH_MedQA_Reason
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-4.0
dataset_info:
features:
- name: input
dtype: string
- name: output
dtype: string
- name: output_reason
dtype: string
- name: summary
dtype: string
splits:
- name: train
num_bytes: 27615690
num_examples: 10161
download_size: 15427012
dataset_size: 27615690
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
Used `mlx-community/Phi-3-mini-4k-instruct-4bit-no-q-embed` to generate summaries from `akemiH/MedQA-Reason`:
```python
import os
import datasets
from mlx_lm import load, generate
def _summarize(example):
prompt = f"<|user|>\n{example['input'].strip()}\n{example['output_reason']}\n\nSummarize the keypoint of the above question-answer pair into one sentence.<|end|>\n<|assistant|>"
example['summary'] = generate(model, tokenizer, prompt, max_tokens=500)
return example
model, tokenizer = load("mlx-community/Phi-3-mini-4k-instruct-4bit-no-q-embed", tokenizer_config={'eos_token':'<|end|>'})
ds = datasets.load_dataset('akemiH/MedQA-Reason', split='train')
ds = ds.map(_summarize)
ds.push_to_hub("JosefAlbers/akemiH_MedQA_Reason", split='train', private=True, token=os.getenv('HF_TOKEN'))
```
提供机构:
JosefAlbers
原始信息汇总
数据集概述
基本信息
- 语言: 英语
- 许可证: CC-BY-4.0
数据集特征
- input: 数据类型为字符串
- output: 数据类型为字符串
- output_reason: 数据类型为字符串
- summary: 数据类型为字符串
数据集划分
- 训练集:
- 示例数量: 10161
- 数据大小: 27615690字节
- 下载大小: 15427012字节
配置
- 默认配置:
- 数据文件路径:
data/train-*
- 数据文件路径:



