PerMedCQA
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/NaghmehAI/PerMedCQA
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为PerMedCQA,是首个用于评估大型语言模型在现实世界中、由消费者生成的医疗问题上的波斯语基准。它包含了从初步的87,780条原始数据中精炼出的68,138个问答对。该数据集涵盖了各种互动结构、元数据字段,并经过两阶段的数据清洗过程进行净化。为确保在医疗领域的平衡表现,数据集被分层为训练集、评估集和测试集。规模上,该数据集包含68,138个问答对,其任务定位于医疗消费者问答。
This dataset, named PerMedCQA, is the first Persian benchmark for evaluating large language models (LLMs) on real-world, consumer-generated medical queries. It consists of 68,138 question-answer (QA) pairs refined from an initial set of 87,780 original data entries. This dataset covers diverse interaction structures and metadata fields, and has been purified through a two-stage data cleaning process. To ensure balanced performance in the medical domain, the dataset is stratified into training, evaluation, and test sets. In terms of scale, this dataset contains 68,138 QA pairs, with its task targeting consumer medical question answering.
提供机构:
NaghmehAI



