LLMs in Genetics MCQs

Name: LLMs in Genetics MCQs
Creator: Science Data Bank
Published: 2026-04-16 06:37:02
License: 暂无描述

DataCite Commons2026-04-16 更新2026-05-05 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=ed25b7b073854280b644bf87ff3049c4

下载链接

链接失效反馈

官方服务：

资源简介：

Project OverviewThis dataset supports a study evaluating the performance and reliability of five large language models (LLMs) — Gemini, Claude, ChatGPT, Copilot, and DeepSeek — in answering genetics multiple-choice questions (MCQs) in a medical education context. The study used 200 USMLE-style MCQs distributed across 20 genetics topics, with each LLM completing three independent testing sessions. Questions were additionally classified by Bloom's Taxonomy levels (1–4) to assess performance across cognitive complexity levels.Questions.csv contains 200 rows — one per MCQ. Each question is tagged with a Bloom's Taxonomy level (1–4), and the five LLM columns (Gemini, Claude, ChatGPT, Copilot, DeepSeek) show the average score across 3 independent attempts, where 1.0 = always correct, 0.0 = always incorrect, and values in between (e.g., 0.67) indicate the model got it right in 2 out of 3 attempts.Topics.csv contains 20 rows — one per genetics topic. For each of the five LLMs, there are three separate columns (A1, A2, A3) representing the percentage of correct answers in each individual testing session. This allows both within-model consistency (reliability across attempts) and between-model comparisons to be assessed at the topic level.Together, the two files capture performance at two levels of granularity: question-level (with Bloom's classification) and topic-level (with attempt-by-attempt breakdown), enabling analysis of both accuracy and test-retest reliability across cognitive complexity levels and subject areas.

提供机构：

Science Data Bank

创建时间：

2026-04-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集