five

PlantMarkerBench: A Multi-Species Benchmark for Evidence-Based Gene Marker Reasoning

收藏
DataCite Commons2026-05-07 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20035293
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is part of an anonymous submission to NeurIPS 2026. PlantMarkerBench is a multi-species benchmark for evaluating evidence-based gene marker reasoning from plant biology literature. The dataset is constructed using an automated pipeline that integrates large-scale literature retrieval, hybrid search, and structured evidence extraction, followed by human verification. The dataset spans four plant species: Arabidopsis, maize, rice, and tomato. Each instance consists of a gene, cell type, and candidate evidence sentence, annotated with:- validity of marker evidence- evidence type (expression, function, localization, indirect, noise)- support strength We define a reasoning task where models must determine whether a given sentence supports a gene as a valid cell-type marker and classify the type of evidence. For review purposes, we provide a representative sample (n=600) with balanced valid/invalid examples and diverse evidence types. The full dataset is also included. This dataset is intended for benchmarking scientific reasoning in language models and not for direct biological decision-making without expert validation.
提供机构:
Zenodo
创建时间:
2026-05-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作