LAB Bench 语言模型生物学基准数据集

超神经2024-08-07 更新2024-12-14 收录

下载链接：

https://hyper.ai/cn/datasets/33018

下载链接

链接失效反馈

官方服务：

资源简介：

人们普遍乐观地认为，前沿大语言模型 (LLM) 和 LLM 增强系统有可能迅速加速各学科的科学发现。如今，有很多基准可以衡量 LLM 在教科书式科学问题上的知识和推理能力，但很少有基准被用来评估语言模型在科学研究所需的实际任务（如文献检索、协议规划和数据分析）上的性能。

There is widespread optimism that cutting-edge large language models (LLMs) and LLM-augmented systems hold the potential to rapidly accelerate scientific discovery across all disciplines. To date, numerous benchmarks have been developed to evaluate the knowledge and reasoning capabilities of LLMs on textbook-style scientific problems, yet very few benchmarks have been used to assess the performance of language models on practical tasks required for scientific research, such as literature retrieval, protocol planning and data analysis.

创建时间：

2024-07-22

搜集汇总

数据集介绍

背景与挑战

背景概述

LAB Bench是一个语言模型生物学基准数据集，由FutureHouse研究团队于2024年发布，包含超过2400道选择题，旨在评估人工智能系统在生物学研究中的实际能力，如文献检索、数据解读和数据库导航等，以弥补现有基准在科学任务评估上的不足。

以上内容由遇见数据集搜集并总结生成