five

EQUES/JMedBench-Train

收藏
Hugging Face2024-11-15 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/EQUES/JMedBench-Train
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是从JMedBench数据集中提取并修改而来,包含674,954行日文和英文的医疗文本,列名为text。数据集的构建方式为仅提取训练子集,并将其中的问答样本合并为一个句子,英文样本格式为Questions:{Question}Answer:{Answer},日文样本格式为質問:{Question}回答:{Answer}。该数据集主要用于医疗领域大型语言模型的持续预训练,不建议用于其他用途。

This is a modified dataset extracted from a part of JMedBench, containing 674,954 lines of Japanese and English medical text. The dataset only includes the train subset, and the question-answering samples are merged into one sentence, formatted as Questions:{Question}Answer:{Answer} for English samples and 質問:{Question}回答:{Answer} for Japanese samples. No other modifications were applied, and the dataset is primarily intended for the continual pretraining of medical large language models.
提供机构:
EQUES
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作