Genentech/assaybench

Name: Genentech/assaybench
Creator: Genentech
Published: 2026-04-29 05:53:29
License: 暂无描述

Hugging Face2026-04-29 更新2026-05-10 收录

下载链接：

https://hf-mirror.com/datasets/Genentech/assaybench

下载链接

链接失效反馈

官方服务：

资源简介：

AssayBench是一个用于评估计算模型在表型CRISPR屏幕预测方面的基准数据集，包含来自BioGRID ORCS的1,901个经过筛选的CRISPR屏幕条目，覆盖了五种主要的细胞表型类别。主要任务是根据CRISPR屏幕实验的文本描述，预测与观察到的表型最相关的100个基因的排名列表。数据集结构包括两个子集：biogrid（主基准集）和LaTest（保留的新评估集）。数据字段包括数据集名称、相关基因、相关分数、命中标签、细胞系、细胞类型、表型描述等。数据集创建过程包括源数据清理、表型注释、复制合并、增强和质量控制等步骤。评估指标包括调整后的nDCG@k、Precision@k和FDR@k。

AssayBench is a benchmark for evaluating computational models on phenotypic CRISPR screen prediction — a core capability of the "virtual cell" paradigm. It contains 1,901 curated CRISPR screen entries derived from 1,565 unique screens in BioGRID ORCS (version 2025), spanning five major classes of cellular phenotypes. Given a textual description of a CRISPR screen experiment, the task is to predict a ranked list of the 100 genes most relevant to the observed phenotype. The dataset structure includes two subsets: biogrid (main benchmark) and LaTest (held-out novel evaluation set). Data fields include dataset name, relevance genes, relevance scores, hit labels, cell line, cell type, phenotype description, etc. The dataset creation process involves source data cleaning, phenotype annotation, replicate merging, augmentation, and quality control. Evaluation metrics include adjusted nDCG@k, Precision@k, and FDR@k.

提供机构：

Genentech

5,000+

优质数据集

54 个

任务类型

进入经典数据集