LAB-Bench

arXiv2025-09-30 收录

下载链接：

https://huggingface.co/datasets/futurehouse/lab-bench

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个涵盖范围广泛的2400多道多项选择题，旨在评估人工智能系统在实践生物学研究能力方面的表现，包括对文献的回忆和推理、对图表的解释、数据库的访问与导航，以及对DNA和蛋白质序列的理解与操作。此外，该数据集还包括了诸如LitQA2、FigQA、TableQA、SuppQA、DbQA、SeqQA、ProtocolQA以及克隆场景等多种组成部分，每个部分都针对语言模型在生物学背景下的不同能力进行评估。规模上，该数据集包含了超过2400个问题，任务则是评估人工智能系统在实践生物学研究任务上的表现。

This dataset is a comprehensive set of over 2,400 multiple-choice questions, intended to evaluate the performance of artificial intelligence (AI) systems in practical biological research capabilities, covering literature recall and reasoning, chart interpretation, database access and navigation, as well as the understanding and manipulation of DNA and protein sequences. Additionally, the dataset includes multiple components such as LitQA2, FigQA, TableQA, SuppQA, DbQA, SeqQA, ProtocolQA, and cloning scenarios, each of which assesses different capabilities of language models within a biological context. In terms of scale, this dataset contains over 2,400 questions, and its core task is to evaluate the performance of AI systems on practical biological research tasks.

5,000+

优质数据集

54 个

任务类型

进入经典数据集