five

Identifying cancer-associated leukocyte profiles using a high-resolution flow cytometry screening pipeline

收藏
Research Data Australia2024-12-14 收录
下载链接:
https://researchdata.edu.au/identifying-cancer-associated-screening-pipeline/2211066
下载链接
链接失效反馈
官方服务:
资源简介:
Methods Animals C57BL/6 (B6) and BALB/c (BC) female mice aged between 6-10 weeks (from the Australian Phenomics Facility, ANU) were used in the study. Animals were housed in a specific-pathogen free environment and used under strict adherence to protocols approved by the institutional Animal Experimentation Ethics Committee (AEEC), ANU, under protocol A2020/39. At experimental end points, animals were euthanised by cervical dislocation according to AEEC approved procedures. Cell lines The mammary carcinomas cell lines, 4T124 (ATCC), 4T1.225 (kindly provided by Dr Robin Anderson, Olivia Newton-John Cancer Research Institute), 4T1Br426 (kindly provided by Dr Normand Pouliot, Olivia Newton-John Cancer Research Institute), and AT-3-OVA27 (kindly provided by Dr. Di Yu), the colorectal carcinomas cell lines, CT2628 (ATCC) and MC3829 (kindly provided by Dr. Di Yu), and the melanoma cell line B16-F1030 (kindly provide by Dr Chris Parish) were used in this study. Cell lines were confirmed clear of specific pathogens by Cerberus Sciences (ISO 9001 Licence No. AU843-QC). Cell lines were cultured and subcultured as described previously in supplemented (sRPMI) RPMI-1640 (11875093, ThermoFisher Scientific)18. Tumour establishment Tumour cells (1 x 105) were injected subcutaneously in the right hind flank (primary tumour) and then 3 days later in the left-hind flank (secondary tumour) of syngeneic mice (cell lines 4T1, 4T1.2, 4T1Br4, and CT26 injected in BC mice, and cell lines AT-3-OVA, MC38 and B16-F10 injected in B6 mice) randomised across housing cages as described previously18. Tumours were left to grow for 17-21 days. At endpoint, mice were humanely sacrificed, and their tumours and spleens excised and weighed. Blood and spleen collection and processing Blood and spleens from mice were collected at experimental end point. Blood was collected and processed as described previously31. Spleens were harvested and processed to single cells as described previously with the exception that the red blood cell lysis step was not performed32. Spleen cell barcoding Spleen cells from three (of the 3-5) replicate mice bearing the largest mass of 14-17-day old tumours from either 4T1, 4T1.2, 4T1Br4, AT-3-OVA, CT26 MC38 or B16-F10 cell lines, or healthy controls (host B6 or BC mice) were pooled into nine separate tubes in a total of 10 mL of phosphate buffered saline (PBS). The nine spleen cell groups were made to the equivalent of 2 spleen masses based on spleen weights (~equivalent to the mass of 2 normal spleens) by removing appropriate cell suspension volumes from each tube. Cells volumes were then made to 10 mL with PBS and cells sedimented by centrifugation (300g for 10 min), supernatant aspirated and cells resuspended in 2.9 mL sRPMI. Each spleen cell group was then barcoded separately with a unique concentration of carboxyfluorescein diacetate succinimidyl ester (CFSE) and/or cell trace violet (CTV) (S1 Table) and all groups were then pooled in to one sample as previously described33. Cells were then suspended in 10 mL of sRPMI and passed through a 70 m filter mesh and counted. A total of 400 x 106 leukocytes was then suspended in 10 mL sRPMI, passed through a 70 m filter mesh, sedimented by centrifugation (300g for 10 min) and supernatant aspirated, ready for immediate backbone antibody labelling. S1 Table: Barcoding vital dye cell-labelling concentrations Group Sequence CFSE concentration CTV concentration Nil BALB/c 1 74 nM 0 nM CT26 2 74 nM 1500 nM 4T1 3 74 nM 20300 nM B16-F10 4 11 nM 0 nM MC38 5 11 nM 1500 nM AT3-OVA 6 11 nM 20300 nM 4T1.Br4 7 0 nM 0 nM 4T1.2 8 0 nM 1500 nM Nil C57BL/6 9 0 nM 20300 nM Backbone antibody labelling Barcoded pooled cells were resuspended in 0.6 mL of Labelling Buffer (PBS containing 5 mM EDTA, 1% BSA (w/v)) with 5 mg/mL TruStain FcXTM (anti-mouse CD16/32) antibody (101320, Biolegend) for 15 min at 4oC. Samples were then incubated with a backbone panel of antibodies (S2 Table) by adding 0.6 mL of Labelling Buffer with 10% (v/v) Brilliant stain buffer (BD) containing a 2 X stock of each antibody (S2 Table), for 30 min at 4oC. The pooled barcoded and backbone antibody-labelled cells where then resuspended to 13.4 x 106 cells per mL (ie 1x106 cell/ 75 L) in Labelling Buffer and passed through a 70 m filter mesh ready for aliquoting into the wells of the LEGENDScreen plates. S2 Table: Antibodies Panel Antigen Clone Fluorochrome Cat. # Isotype (r = rat; ah = armenian hamster) Source 2 X Stock dilution factor Backbone CD45 30-F11 PerCP-Cy5.5 103132 r-IgG2b, k Biolegend 1/50 CD90.2 53-2.1 PE-Cy7 105326 r-IgG2b, k Biolegend 1/200 CD4 RM4-5 AF-700 100536 r-IgG2a,k Biolegend 1/400 CD8a 53-6.7 BV650 100742 r-IgG2a,k Biolegend 1/50 PD-1 29F.1A12 APC 135210 r-IgG2a,k Biolegend 1/400 CD25 PC61 APC-F750 102054 r-IgG2a, l Biolegend 1/400 B220 RA3-6B2 AF-700 103232 r-IgG2a,k Biolegend 1/50 CD11c N418 APC 117310 ah-IgG Biolegend 1/100 CD11b M1/70 APCFire750 101262 r-IgG2b, k Biolegend 1/400 Ly-6C HK1.4 BV711 128037 r-IgG2c,k Biolegend 1/50 Ly-6G 1A8 BV650 127606 r-IgG2a,k Biolegend 1/100 F4/80 BM8 PE-Cy7 123114 r-IgG2a,k Biolegend 1/100 I-A/I-E (MHC-II) M5/114.15.2 BV605 107639 r-IgG2b, k Biolegend 1/50 PD-L1 10F.9G2 PE-Dazzle594 124324 r-IgG2b, k Biolegend 1/50 Siglec-F E50-2440 BV786 740956 r-IgG2a,k BD 1/50 CD49b DX5 BUV395 740250 ah-IgG1, k BD 1/400 TCRb H57-597 BV605 109241 ah-IgG Biolegend 1/50 Backbone + screen markers CD62L MEL-14 BV570 104433 r-IgG2a,k Biolegend 1/50 CD44 IM7 BUV737 612799 r-IgG2b, k BD 1/50 CD24 M1/69 PE 101808 r-IgG2b,k Biolegend 1/200 CD45RB C363-16A FITC 103305 r-IgG2a,k Biolegend 1/100 IgD 11-26c.2a BV421 405725 r-IgG2a,k Biolegend 1/100 CD66a Mab-CC1 BV650 134529 m-IgG1, k Biolegend 1/100 LEGENDScreen assay A LEGENDScreen Mouse PE Kit (BioLegend) was used for spleen leukocyte screening for cancer-specific cell-surface markers. Plates from the kit were prepared according to the manufacturer’s instructions with lyophilised antibodies in each well of the assay plates being resuspended in 25 L of deionized H2O. The pooled barcoded and backbone antibody-labelled cells were added at 75 L (ie 1x106 cell) to each well containing the reconstituted antibodies and incubated in the dark for 30 min at 4OC. Cells were then washed in Legend Screen Wash provided in the kit and cell pelleted and resuspend in 0.04 mL Labelling Buffer containing 0.001 mg/ml of the viability dye Hoechst 33285 and the equivalent of 500 Flow-Count Fluorospheres (7547053, Beckman Coulter) per 0.04 mL and stored at 4oC overnight before flow cytometry. Immunophenotyping of blood leukocytes by flow cytometry Blood samples (0.005 mL) labelled with antibodies that included the backbone panel and the screen-identified antibodies (S2 Table) and prepared for flow cytometry analysis using methods described previously31. Flow cytometry Flow cytometry was performed on a BD X-20 (BD Bioscience) flow cytometer with FACSDiva software. Application Settings were applied to standardise fluorescence intensity readings between experiments, and fluorescence intensities monitored using SpheroTM 8-peak Rainbow Beads (110620, BD Bioscience). Voltages were initially setup using unlabelled RBC-lysed blood leukocytes. BD CompBeads (552843, BD Bioscience) were used as compensation controls as previously described31. Blood cell samples were acquired until a total of 2000 Flow-Count Fluorosphere beads were collected based on side scatter (log) and forward scatter (linear) plot gating. LEGENDScreen samples were acquired at 10,000 event/second using the sample fine adjust and on a low sample flow rate to collect a total of ~1-3 x 105 live CD45+ cells. Every 36th sample was followed by a 3 min run on a high sample flow rate with 10% sodium hypochlorite then a 2 min run on a high sample flow rate with ddH2O and the stability of fluorescence of each channel assessed by acquiring SpheroTM 8-peak Rainbow Beads. Raw Flow Cytometry Standard (FCS) files of the data are available upon request at the ANU DATA COMMONS repository (https://dx.doi.org/10.25911/6153a8ab5747c). Flow cytometry analysis Flow cytometry analysis was performed using FlowJo v10 software (BD Bioscience) and the R package CytoExploreR version 2.0.034 (https://dillonhammill.github.io/CytoExploreR/). A combination of manual gating and unsupervised Pairwise Controlled Manifold Approximation Projection (PaCMAP) analysis was use to delineate cell populations and assess for manual gate cell population segregation, and cell groups then named based on marker expression represented by median fluorescent intensities (MedFI) of each marker plotted using heat map dot plots made using the R packages ggplot2 (https://ggplot2.tidyverse.org) (see Results section). Data normalisation and processing Blood leukocyte data To reduce the influence of inter-experimental technical variability on the independent blood analysis experiments, their data was normalised at several levels. Firstly, cell numbers in each flow cytometry acquisition set were normalised to counting beads spiked into the sample, with each sample normalised to 5000 Flow-Count Fluorospheres (5/5 of the spiked load), to give the number of cells in ~0.005 mL of blood (“counting bead normalised” values). Secondly, these normalised counts were normalised to the mean counts of the respective blood leukocytes from non-tumour bearing control animals within each experiment, the “nil normalised values”. To get “normalised cell counts” per 0.005 mL of blood (as an estimate of the overall cells across the groups), the “nil normalised values” were multiplied to the overall mean of the “bead normalised cell count” from all non-tumour-bearing animals for each cell population across all experiments. LEGENDScreen data Leukocyte marker expression changes in cancer samples was compared to healthy levels as follows: Background (matched healthy controls) PE MedFI of LEGENDScreen markers on each cell population was subtracted from the corresponding marker MedFI of the same cell population in each tumour type. This MedFI difference was then divided by the maximum PE MedFI change of each marker for each population and any values less than -1 was assigned as -1. This gave a cancer-specific marker change scaled from -1 to 1 (with 0 being normal). These values where visualised using a heat map dot plot using the R package ComplexHeatmap35. Supervised machine learning Supervised machine learning was performed using Orange 3 software. Random Forest and CATboost modelling used 100 trees for predictions or 500 for ranking feature importance, with a maximum tree depth of 4 (for Random Forest) or 6 (for CATBoost) and for Random Forest a maximum number of features considered at each node was 5 and subsets smaller than 5 not split. In addition, for CATBoost learning, the learning rate was 0.3, the regularisation was lambda 3 and subsampling was 1. For classification of groups using monocytes, CATBoost was used and trained on 66% of randomly sampled data and tested on the remaining data, this repeated 100 times and results of predicted and actual classes displayed as a confusion matrix. Feature ranking was done using both Random Forest and CATBoost (built into the models in Orange 3 software). For the learning curve as a function of decreased features (populations), CATBoost was used and trained on 66% of randomly sampled data and tested on the remaining data, this repeated 100 times and results assessed using area under curve of the receiver operating characteristics (AUC; to assess separability of the classes), classification accuracy (CA; proportion of correct classification), precision (ratio of correct positive prediction to all predicted positive), recall (ratio of correct positive prediction to actual positive), and F1 score (weighted average of precision and recall). Final CATboost predictions on an optimised subset of features used 66% of randomly sampled data for training and predictions were on the remaining data. Orange 3 workflows are provided in S1 File. Statistical analysis and data presentation For means comparisons between Nil, CT26 and 4T1 cohorts, data was transformed using the formula Y=Log(Y+1) to help normalise distributions and equalise variance, and then assessed by 2-way ANOVA using GraphPad Prism software. Analysis was corrected for multiple comparisons using the two-stage step-up method of Benjamini, Krieger and Yekuyieli36 and a false discovery rate of 0.05 and p values reported to test the null hypothesis that the means are equal or distributions were from the same population. PaCMAP used the pacmap python package through CytoExploroR. Multidimensional scaling used the cmdscale function in the R package, stats (v 3.6.2) using Euclidean distances and displayed using the cyto_plot function in the CytoExploreR R package. Heatmap dot plot were made through several R packages including ggplot2, ComplexHeatmap and HeatmapR. Log ratio (M) log average (A) (MA) plots were constructed using the ggpubr, ppplot2 and pprepel R packages. Pythagorean trees and confusion matrices were made in Orange 3 software. Circular bar plots were made using ggplot2 in R. Prism was also used for plotting data.

方法 ## 实验动物 本研究使用6~10周龄雌性C57BL/6(B6)及BALB/c(BC)小鼠,购自澳大利亚国立大学(ANU)表型组学设施(Australian Phenomics Facility)。所有小鼠饲养于无特定病原体(specific-pathogen free, SPF)环境中,实验操作严格遵循ANU机构动物实验伦理委员会(Animal Experimentation Ethics Committee, AEEC)批准的方案(编号A2020/39)。实验终点时,按照AEEC批准的操作规程,采用颈椎脱臼法对小鼠实施安乐死。 ## 细胞系 本研究使用的肿瘤细胞系包括:乳腺癌细胞系4T124(购自ATCC)、4T1.225(由Olivia Newton-John癌症研究所Robin Anderson博士惠赠)、4T1Br426(由Olivia Newton-John癌症研究所Normand Pouliot博士惠赠)及AT-3-OVA27(由Di Yu博士惠赠);结直肠癌细胞系CT2628(购自ATCC)、MC3829(由Di Yu博士惠赠);以及黑色素瘤细胞系B16-F1030(由Chris Parish博士惠赠)。所有细胞系均经Cerberus Sciences(ISO 9001认证,许可证号AU843-QC)检测确认无特定病原体污染。细胞系的培养与传代采用添加完全培养液(supplemented RPMI-1640, sRPMI)的RPMI-1640培养基(货号11875093,赛默飞世尔科技(ThermoFisher Scientific)),具体方法参照此前文献18。 ## 肿瘤构建 将1×10^5个肿瘤细胞接种至同基因小鼠的右侧后肢皮下(原发肿瘤),3天后再接种至左侧后肢皮下(继发肿瘤)。其中,4T1、4T1.2、4T1Br4及CT26细胞系接种于BC小鼠,AT-3-OVA、MC38及B16-F10细胞系接种于B6小鼠,小鼠按饲养笼随机分组,方法参照此前文献18。肿瘤生长17~21天后,于实验终点处死小鼠,切除肿瘤及脾脏并称重。 ## 血液与脾脏的采集及处理 于实验终点采集小鼠血液及脾脏。血液的采集与处理参照此前文献31的方法。脾脏采集后,参照此前文献32的方法制备单细胞悬液,仅省略红细胞裂解步骤。 ## 脾脏细胞条形码标记 选取携带14~17日龄肿瘤的3~5只重复实验小鼠中肿瘤负荷最大的3只,分别来自4T1、4T1.2、4T1Br4、AT-3-OVA、CT26、MC38或B16-F10细胞系荷瘤小鼠,以及健康对照(B6或BC小鼠)。将各组脾脏细胞分别收集至9个离心管中,用10 mL磷酸盐缓冲液(phosphate buffered saline, PBS)重悬。根据脾脏重量调整细胞悬液体积,使每组细胞量等效于2个正常脾脏的细胞量,随后将每组体积补至10 mL PBS。以300g离心10分钟沉淀细胞,弃上清,用2.9 mL sRPMI重悬细胞。每组脾脏细胞分别用不同浓度的羧基荧光素二乙酸琥珀酰亚胺酯(carboxyfluorescein diacetate succinimidyl ester, CFSE)和/或细胞示踪紫(cell trace violet, CTV)进行单独条形码标记(详见S1表),随后将所有组细胞混合为一个样本,方法参照此前文献33。将混合后的细胞用10 mL sRPMI重悬,经70 μm滤膜过滤后计数。取总计400×10^6个白细胞,用10 mL sRPMI重悬,经70 μm滤膜过滤后,以300g离心10分钟沉淀细胞,弃上清,即刻用于后续抗体标记。 ### S1表:细胞活力染料条形码标记浓度 分组 序号 CFSE浓度 CTV浓度 Nil BALB/c 1 74 nM 0 nM CT26 2 74 nM 1500 nM 4T1 3 74 nM 20300 nM B16-F10 4 11 nM 0 nM MC38 5 11 nM 1500 nM AT3-OVA 6 11 nM 20300 nM 4T1.Br4 7 0 nM 0 nM 4T1.2 8 0 nM 1500 nM Nil C57BL/6 9 0 nM 20300 nM ## 主抗体标记 将条形码标记后的混合细胞重悬于0.6 mL标记缓冲液(含5 mM EDTA、1%(w/v)牛血清白蛋白(bovine serum albumin, BSA)的PBS)中,加入5 mg/mL TruStain FcXTM(抗小鼠CD16/32)抗体(货号101320,Biolegend),4℃孵育15分钟。随后加入0.6 mL标记缓冲液(含10%(v/v)Brilliant染色缓冲液(BD)及每种抗体的2×储备液,详见S2表),与抗体主panel共同孵育,4℃孵育30分钟。将经条形码标记及主抗体染色后的混合细胞重悬于标记缓冲液中,调整浓度至13.4×10^6个细胞/mL(即每75 μL含1×10^6个细胞),经70 μm滤膜过滤后,准备接种至LEGENDScreen板孔中。 ### S2表:抗体信息 组套 抗原 克隆号 荧光素 货号 同型(r=大鼠;ah=亚美尼亚仓鼠) 来源 2×储备液稀释倍数 主抗体组 CD45 30-F11 PerCP-Cy5.5 103132 r-IgG2b, κ Biolegend 1/50 CD90.2 53-2.1 PE-Cy7 105326 r-IgG2b, κ Biolegend 1/200 CD4 RM4-5 AF-700 100536 r-IgG2a,κ Biolegend 1/400 CD8a 53-6.7 BV650 100742 r-IgG2a,κ Biolegend 1/50 PD-1 29F.1A12 APC 135210 r-IgG2a,κ Biolegend 1/400 CD25 PC61 APC-F750 102054 r-IgG2a, λ Biolegend 1/400 B220 RA3-6B2 AF-700 103232 r-IgG2a,κ Biolegend 1/50 CD11c N418 APC 117310 ah-IgG Biolegend 1/100 CD11b M1/70 APCFire750 101262 r-IgG2b, κ Biolegend 1/400 Ly-6C HK1.4 BV711 128037 r-IgG2c,κ Biolegend 1/50 Ly-6G 1A8 BV650 127606 r-IgG2a,κ Biolegend 1/100 F4/80 BM8 PE-Cy7 123114 r-IgG2a,κ Biolegend 1/100 I-A/I-E (MHC-II) M5/114.15.2 BV605 107639 r-IgG2b, κ Biolegend 1/50 PD-L1 10F.9G2 PE-Dazzle594 124324 r-IgG2b, κ Biolegend 1/50 Siglec-F E50-2440 BV786 740956 r-IgG2a,κ BD 1/50 CD49b DX5 BUV395 740250 ah-IgG1, κ BD 1/400 TCRb H57-597 BV605 109241 ah-IgG Biolegend 1/50 主抗体组+筛选标记组 CD62L MEL-14 BV570 104433 r-IgG2a,κ Biolegend 1/50 CD44 IM7 BUV737 612799 r-IgG2b, κ BD 1/50 CD24 M1/69 PE 101808 r-IgG2b,κ Biolegend 1/200 CD45RB C363-16A FITC 103305 r-IgG2a,κ Biolegend 1/100 IgD 11-26c.2a BV421 405725 r-IgG2a,κ Biolegend 1/100 CD66a Mab-CC1 BV650 134529 m-IgG1, κ Biolegend 1/100 ## LEGENDScreen检测实验 本实验采用LEGENDScreen小鼠PE检测试剂盒(Biolegend)筛选脾脏白细胞的癌症特异性细胞表面标志物。按照试剂盒说明书制备检测板:将每孔中的冻干抗体用25 μL去离子水重悬。向每孔重悬后的抗体中加入75 μL经条形码标记及主抗体染色的混合细胞(即每孔含1×10^6个细胞),4℃避光孵育30分钟。随后用试剂盒自带的Legend Screen洗涤液洗涤细胞,离心沉淀细胞后,用0.04 mL标记缓冲液重悬,其中加入0.001 mg/mL的活力染料Hoechst 33285,以及每0.04 mL体系含500个流式计数微球(Flow-Count Fluorospheres,货号7547053,Beckman Coulter)。样本于4℃避光保存过夜,次日进行流式细胞术检测。 ## 血液白细胞免疫表型流式分析 取0.005 mL血液样本,加入包含主抗体panel及筛选获得的标志物抗体(详见S2表)进行标记,按照此前文献31的方法制备流式检测样本。 ## 流式细胞术 流式细胞术检测采用BD X-20流式细胞仪(BD Bioscience)及FACSDiva软件。通过应用标准化设置统一不同实验间的荧光强度读数,使用SpheroTM 8峰彩虹微球(货号110620,BD Bioscience)监测荧光强度。初始电压设置采用未标记的红细胞裂解后血液白细胞进行调试。采用BD CompBeads(货号552843,BD Bioscience)作为补偿对照,方法参照此前文献31。血液样本采集时,根据侧向散射(对数)和前向散射(线性)散点图门控,直至采集到总计2000个流式计数微球。LEGENDScreen样本以10,000事件/秒的速率采集,采用精细采样调节及低采样流速,直至采集到总计约1~3×10^5个活CD45+细胞。每36个样本后,以高采样流速运行3分钟,使用10%次氯酸钠冲洗,随后以高采样流速运行2分钟,用去离子水冲洗,并通过采集SpheroTM 8峰彩虹微球评估各通道荧光稳定性。原始流式细胞术标准(Flow Cytometry Standard, FCS)文件可通过ANU数据共享库(https://dx.doi.org/10.25911/6153a8ab5747c)申请获取。 ## 流式细胞术数据分析 流式数据分析采用FlowJo v10软件(BD Bioscience)及R包CytoExploreR 2.0.034版本(https://dillonhammill.github.io/CytoExploreR/)。结合手动门控与无监督成对受控流形近似投影(Pairwise Controlled Manifold Approximation Projection, PaCMAP)分析来界定细胞群,评估手动门控的细胞群分离效果,随后根据各标志物的中位荧光强度(median fluorescent intensities, MedFI)表达情况对细胞群进行命名,采用R包ggplot2(https://ggplot2.tidyverse.org)绘制热图散点图进行可视化(详见结果部分)。 ## 数据标准化与处理 ### 血液白细胞数据 为降低不同实验间技术变异对独立血液分析实验的影响,对数据进行多层面标准化处理:首先,将每个流式采集数据集的细胞数标准化至体系内加入的计数微球,将每个样本标准化至5000个流式计数微球(即加入的微球总量),得到约0.005 mL血液中的细胞数(“计数微球标准化值”)。其次,将上述标准化后的细胞计数,以每个实验中未荷瘤对照动物的相应血液白细胞平均计数为基准进行标准化,得到“空白标准化值”。最后,将“空白标准化值”乘以所有实验中各细胞群的未荷瘤动物的“计数微球标准化细胞数”总平均值,得到每0.005 mL血液的“标准化细胞计数”(用于估算各组总细胞数)。 ### LEGENDScreen数据 将癌症样本的白细胞标志物表达水平与健康对照水平进行比较:将每种肿瘤类型中同一细胞群的标志物MedFI,减去对应细胞群在匹配健康对照中的PE MedFI背景值。将得到的MedFI差值除以该标志物在该细胞群中的最大PE MedFI变化量,若所得值小于-1则赋值为-1,最终得到缩放范围为-1~1的癌症特异性标志物变化值(0代表正常水平)。采用R包ComplexHeatmap35绘制热图散点图对上述值进行可视化。 ## 监督机器学习 监督机器学习采用Orange 3软件完成。随机森林(Random Forest)与CATboost模型均采用100棵树进行预测,或采用500棵树进行特征重要性排序;最大树深度分别为4(随机森林)和6(CATBoost);随机森林模型中,每个节点考虑的最大特征数为5,样本量小于5的子集不进行拆分。此外,CATBoost模型的学习率为0.3,正则化参数λ为3,子采样率为1。对于基于单核细胞的组分类任务,采用CATBoost模型,以66%的随机采样数据作为训练集,剩余34%作为测试集,重复该过程100次,将预测与实际分类结果以混淆矩阵形式展示。特征排序采用随机森林与CATBoost两种模型(均内置在Orange 3软件中)。对于特征(细胞群)数量递减的学习曲线分析,采用CATBoost模型,以66%的随机采样数据作为训练集,剩余34%作为测试集,重复该过程100次,通过受试者工作特征曲线下面积(area under curve, AUC,用于评估类别可分离性)、分类准确率(classification accuracy, CA,正确分类的比例)、精确率(正确阳性预测数占所有预测阳性数的比例)、召回率(正确阳性预测数占实际阳性数的比例)及F1分数(精确率与召回率的加权平均值)对结果进行评估。最终采用优化后的特征子集进行CATBoost预测,以66%的随机采样数据作为训练集,剩余数据用于预测。Orange 3工作流详见S1文件。 ## 统计分析与数据展示 对于Nil、CT26及4T1组的均值比较,采用公式Y=Log(Y+1)对数据进行转换,以帮助实现分布正态化及方差齐性,随后采用GraphPad Prism软件进行双因素方差分析(two-way ANOVA)。采用Benjamini、Krieger及Yekuyieli提出的两阶段逐步法进行多重比较校正,设定错误发现率为0.05,报告p值以检验“各组均值相等”或“分布来自同一总体”的原假设。PaCMAP分析通过CytoExploreR调用pacmap Python包完成。多维尺度分析采用R包stats(v3.6.2)中的cmdscale函数,基于欧氏距离计算,通过CytoExploreR R包中的cyto_plot函数进行可视化。热图散点图通过ggplot2、ComplexHeatmap及HeatmapR等多个R包绘制。对数比(M)-对数均值(A)(MA)图采用ggpubr、ggplot2及ggrepel R包构建。毕达哥拉斯树与混淆矩阵通过Orange 3软件绘制。环形条形图采用R中的ggplot2绘制。GraphPad Prism同样用于数据绘图。
提供机构:
The Australian National University
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作