SkyWhal3/PEX10-Eval-Harness
收藏Hugging Face2026-04-23 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/SkyWhal3/PEX10-Eval-Harness
下载链接
链接失效反馈官方服务:
资源简介:
PEX10-Eval是一个用于评估AI系统在PEX10相关过氧化物酶体生物发生障碍上性能的基准数据集。包含60个问题(50个基础问题+10个对抗性问题),并采用3层引用相关性分类器(`direct` / `indirect` / `off_topic`)。该数据集是ARIA罕见病评估系列的第三个基准,旨在测试对PEX10(过氧化物酶体生物发生因子10 / peroxin-10 / RNF69)的深度领域专业知识。PEX10双等位基因变异会导致一系列临床疾病,统称为Zellweger谱系障碍(ZSD)。数据集通过特定问题类别和难度分布,以及Snape层级的对抗性问题,揭示了通用LLM在罕见病查询中的常见失败模式。
PEX10-Eval is a benchmark dataset designed to evaluate AI system performance on PEX10-related peroxisome biogenesis disorders. It consists of 60 questions (50 baseline + 10 adversarial) and features a 3-tier citation relevance classifier (`direct` / `indirect` / `off_topic`). As the third benchmark in the ARIA rare-disease evaluation family, it tests deep domain expertise on PEX10 (Peroxisomal Biogenesis Factor 10 / peroxin-10 / RNF69) — an autosomal-recessive gene that encodes a RING-finger E3 ubiquitin ligase essential for peroxisome matrix protein import. Biallelic PEX10 variants cause a clinical continuum called Zellweger Spectrum Disorder (ZSD). The benchmark exposes common failure modes of general-purpose LLMs on rare disease queries through specific question categories, difficulty distributions, and Snape-tier adversarial questions.
提供机构:
SkyWhal3



