drug-seq-u2os-novartis
收藏Novartis DRUG-seq U2OS MoABox Dataset 概述
数据集基本信息
- 许可证: MIT
- 标签: biology
- 数据文件:
- 分割: train
- 路径: data/train-*
- 下载大小: 65,010,391,233 字节
- 数据集大小: 176,084,077,182 字节
- 样本数量: 49,392
数据集特征
- counts: 序列, int32
- counts_norm: 序列, float32
- counts_log: 序列, float32
- counts_log_norm: 序列, float32
- gene_names: 序列, string
- control_counts: 序列, float32
- control_counts_norm: 序列, float32
- control_counts_log: 序列, float32
- control_counts_log_norm: 序列, float32
- delta_counts: 序列的序列, float32
- delta_counts_norm: 序列的序列, float32
- delta_counts_log: 序列的序列, float32
- delta_counts_log_norm: 序列的序列, float32
- cell_line: string
- perturbation: string
- compound_concentration: float64
- compound_unit: string
- compound_smiles: string
- mechanism: string
- moa: string
- biological_effect: string
- experimental_id: string
- timepoint: string
- text: string
- text_embeddings: 序列, float32
- chembert_embeddings: 序列, float32
数据集描述
- 细胞系: U-2 OS 人类骨肉瘤细胞系
- 扰动类型: 小分子扰动
- 独特化合物数量: 3,742
- 剂量: 4 种不同浓度 +
0.0 - 基因数量: 59,594
- 平台: DRUG-seq
附加信息
- 归一化计数: 每个细胞的总表达量缩放至
1e4 - 控制计数: 所有控制细胞中每个基因的平均表达量
- Delta 值: 每个样本的表达量与相应控制平均值的差异
- SMILES 字符串和 MoA 注释: 来自 ChEMBL 数据库并由 Novartis 整理
引用文献
-
Hadjikyriacou, A., Yang, C., Henault, M., et al.
Novartis DRUG-seq U2OS MoABox Dataset
Novartis DRUG-seq GitHub Repository -
Hadjikyriacou, A., Yang, C., Henault, M., Ge, R., Mansur, L., Lindeman, A., Russ, C., Renner, S., Hild, M., Jenkins, J., Gubser-Keller, C., Li, J., Ho, D. J., Neri, M., Sigoillot, F. D., & Ihry, R. (2025).
Novartis/DRUG-seq U2OS MoABox Dataset (1.0.0) [Data set]. Zenodo.
https://doi.org/10.5281/zenodo.14291446 -
Li, J., Ho, D. J., Henault, M., Yang, C., Neri, M., Ge, R., Renner, S., Mansur, L., Lindeman, A., Tumkaya, T., Russ, C., Hild, M., Gubser Keller, C., Jenkins, J. L., Worringer, K. A., Sigoillot, F. D., & Ihry, R. J. (2021).
DRUG-seq Provides Unbiased Biological Activity Readouts for Drug Discovery. bioRxiv.
https://doi.org/10.1101/2021.06.07.447456
Full text PDF




