EQUES/JMedBench-Train
收藏Hugging Face2024-11-15 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/EQUES/JMedBench-Train
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是从JMedBench数据集中提取并修改而来,包含674,954行日文和英文的医疗文本,列名为text。数据集的构建方式为仅提取训练子集,并将其中的问答样本合并为一个句子,英文样本格式为Questions:{Question}Answer:{Answer},日文样本格式为質問:{Question}回答:{Answer}。该数据集主要用于医疗领域大型语言模型的持续预训练,不建议用于其他用途。
This is a modified dataset extracted from a part of JMedBench, containing 674,954 lines of Japanese and English medical text. The dataset only includes the train subset, and the question-answering samples are merged into one sentence, formatted as Questions:{Question}Answer:{Answer} for English samples and 質問:{Question}回答:{Answer} for Japanese samples. No other modifications were applied, and the dataset is primarily intended for the continual pretraining of medical large language models.
提供机构:
EQUES



