five

Is brightfield all you need for mechanism of action prediction? Image data, CellProfiler features and Grit scores

收藏
DataCite Commons2025-04-01 更新2024-07-13 收录
下载链接:
https://figshare.scilifelab.se/articles/dataset/Is_brightfield_all_you_need_for_mechanism_of_action_prediction_Image_data_CellProfiler_features_and_Grit_scores/21378906/2
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset description: <br> The image data provided here is for U2OS cells treated with compounds belonging to ten MoA classes (MoAs that we believed would be reasonably separable and that had a sufficient number of compounds (n) associated with them in our assay). The 10 MoAs were: ATPase inhibitors (ATPase-i, n = 18); Aurora kinase inhibitors (AuroraK-i, n = 20);  HDAC inhibitors (HDAC-i, n = 33); HSP inhibitors (HSP-i, n = 24); JAK inhibitors (JAK-i, n = 21); PARP inhibitors (PARP-i, n = 21); protein synthesis inhibitors (Prot.Synth.-i, n = 23); retinoid receptor agonists (Ret.Rec.Ag, n = 19); topoisomerase inhibitors (Topo.-i, n = 32); and tubulin polymerization inhibitors (Tub.Pol.-i, n = 20). The compounds were administered at a dose of 10 micromolar and exposed for 48 h, in 384 well plates. Each compound-level experiment was replicated 6 times. The compounds were distributed across 18 microplates. Images (16-bit, 2160x2160 pixels) were captured with a 20X objective at five sites/fields-of-view in each well, with five fluorescence channels for the Cell Painting fluorescence (FL) data and six evenly spaced z-planes for the brightfield (BF) data. <br> Organization of files: <br> 1) <strong>Raw image data</strong>: The image data for the 18 microplates ['P015076', 'P015077', 'P015080', 'P015081', 'P015082', 'P015083', 'P015084', 'P015085', 'P015090', 'P015091', 'P015092', 'P015093', 'P015094', 'P015095', 'P015096', 'P015097', 'P015098', 'P015099'] are located in the corresponding zipped folders (tar.gz). <br> 2) <strong>Data tables</strong>: data_tables.tar.gz. This zipped folder contains the metadata pertaining to the FL (fl_data.csv) and the BF (bf_data.csv) images. Therein is given the plate, well, site (field-of-view), compound and MoA for each of the images. For FL the columns C1 to C5 give the image names for the tiff files corresponding to each of the five fluorescence channels and for BF the columns C1 to C6 correspond to the 6 z-planes. Note that the site identifiers are different for the BF and FL data, wherein sites [1,2,3,4,5] for BF correspond to sites [2,4,5,6,8] for FL. <br> 3) <strong>CellProfiler pipeline:</strong>  HMPSC_2_ICF_Polynom.cppipe - illumination correction pipeline (to calculate illumination correction function) HMPSC_3_FEAT_ICFImg_Cellpose_v1_n50_c150_ft0.8.cppipe - feature extraction pipeline (that applies the illumination correction function and extracts features) <br> 4) <strong>CellProfiler features</strong>: CP_features.tar.gz. This zipped folder contains the cell-level CellProfiler features used for benchmarking purposes in our analysis (CP_features_cells.csv).  <br> 5). <strong>Grit scores</strong>: grit_scores.tar.gz. This zipped folder contains the grit scores and nuclear counts for the imaging sites (grit_scores.csv). This info is provided for all the compounds for which it could be computed (for 227 of the 231 compounds).  <br> Publications: The data in this repository supports the following two publications: <br> 1. "Is brightfield all you need for mechanism of action prediction?" by Harrison et al. <br> 2. "Combining molecular and cell painting image data for mechanism of action prediction" by Tian et al. <br> Abstract for publication 1: <br> Fluorescence staining techniques, such as Cell Painting, together with fluorescence microscopy have proven invaluable for visualizing and quantifying the effects that drugs and other perturbations have on cultured cells. However, fluorescence microscopy is expensive, time-consuming, and labor-intensive, and the stains applied can be cytotoxic, interfering with the activity under study. The simplest form of microscopy, brightfield microscopy, lacks these downsides, but the images produced have low contrast and the cellular compartments are difficult to discern. Nevertheless, harnessing deep learning, these brightfield images may still be sufficient for various predictive purposes. In this study, we compared the predictive performance of models trained on fluorescence images to those trained on brightfield images for predicting the mechanism of action (MoA) of different drugs. We also extracted CellProfiler features from the fluorescence images and used them to benchmark the performance. Overall, we found comparable and correlated predictive performance for the two imaging modalities. This is promising for future studies of MoAs in time-lapse experiments. <br> Abstract for publication 2: <br> The mechanism of action (MoA) of a compound describes the biological interaction through which it produces a pharmacological effect. Multiple data sources can be used for the purpose of predicting MoA, including compound structural information, and various assays, such as those based on cell morphology, transcriptomics and metabolomics. In the present study we explored the benefits and potential additive/synergistic effects of combining structural information, in the form of Morgan fingerprints, and morphological information, in the form of five-channel Cell Painting image data. For a set of 10 well represented MoA classes, we compared the performance of deep learning models trained on the two datasets separately versus a model trained on both datasets simultaneously. On a held-out test set we obtained a macro-averaged F1 score of 0.58 when training on only the structural data, 0.81 when training on only the image data, and 0.92 when training on both together. Thus indicating clear additive/synergistic effects and highlighting the benefit of integrating multiple data sources for MoA prediction. <br> <br>

### 数据集描述 本数据集提供的图像数据源自经化合物处理的U2OS细胞,这些化合物分属10个作用机制(mechanism of action, MoA)类别——我们认为这些类别具备合理的可区分性,且在本实验中关联的化合物数量(n)足够充足。10个MoA类别分别为:ATP酶抑制剂(ATPase inhibitors, ATPase-i, n=18)、极光激酶抑制剂(Aurora kinase inhibitors, AuroraK-i, n=20)、组蛋白去乙酰化酶抑制剂(HDAC inhibitors, HDAC-i, n=33)、热休克蛋白抑制剂(HSP inhibitors, HSP-i, n=24)、JAK激酶抑制剂(JAK inhibitors, JAK-i, n=21)、PARP抑制剂(PARP inhibitors, PARP-i, n=21)、蛋白质合成抑制剂(protein synthesis inhibitors, Prot.Synth.-i, n=23)、类视黄醇受体激动剂(retinoid receptor agonists, Ret.Rec.Ag, n=19)、拓扑异构酶抑制剂(topoisomerase inhibitors, Topo.-i, n=32)以及微管聚合抑制剂(tubulin polymerization inhibitors, Tub.Pol.-i, n=20)。 所有化合物均以10微摩尔的浓度施加,于384孔板中孵育48小时。每项化合物对应实验均重复6次,化合物共分布于18块微孔板中。图像采集参数为:16位深度,分辨率2160×2160像素,使用20倍物镜拍摄,每个孔采集5个视野/位点;细胞绘画(Cell Painting)荧光(fluorescence, FL)数据包含5个荧光通道,明场(brightfield, BF)数据则包含6个等间距的Z轴层面。 ### 文件组织 1) 原始图像数据:18块微孔板(编号依次为P015076、P015077、P015080、P015081、P015082、P015083、P015084、P015085、P015090、P015091、P015092、P015093、P015094、P015095、P015096、P015097、P015098、P015099)的图像数据存储于对应的tar.gz压缩文件夹中。 2) 数据表格:压缩包data_tables.tar.gz内含对应FL数据(fl_data.csv)与BF数据(bf_data.csv)的元数据。该元数据包含每张图像对应的微孔板编号、孔位、视野(位点)、化合物名称及其作用机制类别。FL数据中,C1至C5列分别对应5个荧光通道的TIFF图像文件名;BF数据中,C1至C6列则对应6个Z轴层面。请注意,BF与FL数据的位点标识符存在差异:BF的位点[1,2,3,4,5]对应FL的位点[2,4,5,6,8]。 3) CellProfiler分析流程: - HMPSC_2_ICF_Polynom.cppipe:光照校正流程,用于计算光照校正函数 - HMPSC_3_FEAT_ICFImg_Cellpose_v1_n50_c150_ft0.8.cppipe:特征提取流程,用于应用光照校正函数并提取特征 4) CellProfiler特征:压缩包CP_features.tar.gz内含本分析中用于基准测试的细胞级CellProfiler特征文件(CP_features_cells.csv)。 5) Grit评分:压缩包grit_scores.tar.gz内含成像位点的Grit评分与细胞核计数数据(grit_scores.csv)。该数据覆盖了231种化合物中可完成计算的227种。 ### 关联出版物 本数据集支持以下两篇学术论文: 1. 哈里森等人(Harrison et al.)的论文《Is brightfield all you need for mechanism of action prediction?》(明场成像是否足以实现作用机制预测?) 2. 田等人(Tian et al.)的论文《Combining molecular and cell painting image data for mechanism of action prediction》(融合分子数据与细胞绘画图像数据用于作用机制预测) #### 第一篇论文摘要 荧光染色技术(如细胞绘画技术)结合荧光显微镜已被证实可有效可视化并定量药物及其他干预手段对培养细胞的影响。但荧光显微镜存在成本高昂、耗时耗力的缺陷,且所用染色剂可能具有细胞毒性,干扰研究对象的活性。最简单的显微镜成像方式——明场显微镜则无上述弊端,但生成的图像对比度较低,细胞分区难以分辨。然而,借助深度学习技术,明场图像仍可满足多种预测任务的需求。本研究对比了基于荧光图像与基于明场图像训练的模型在预测不同药物作用机制时的预测性能,同时从荧光图像中提取CellProfiler特征以作为性能基准。整体而言,两种成像模态的预测性能相当且呈正相关,这为未来利用延时实验开展作用机制研究提供了可行思路。 #### 第二篇论文摘要 化合物的作用机制描述了其产生药理效应的生物学相互作用过程。可用于预测作用机制的多源数据包括化合物结构信息,以及基于细胞形态、转录组学、代谢组学等的各类实验数据。本研究探讨了以摩根指纹(Morgan fingerprints)形式呈现的结构信息与以五通道细胞绘画图像数据形式呈现的形态信息相结合的优势,以及二者可能产生的加成/协同效应。针对10个样本量充足的MoA类别,我们分别对比了仅基于单一数据集训练的深度学习模型与同时基于两类数据集联合训练的模型的性能。在预留测试集上,仅使用结构数据训练的模型宏平均F1分数为0.58,仅使用图像数据训练的模型为0.81,联合使用两类数据训练的模型则达到0.92,证实二者存在显著的加成/协同效应,凸显了整合多源数据用于作用机制预测的优势。
提供机构:
Uppsala University
创建时间:
2023-06-20
二维码
社区交流群
二维码
科研交流群
商业服务