Additional file 2 - datasets and scripts for metabolome analysis
收藏DataCite Commons2024-04-29 更新2024-08-19 收录
下载链接:
https://figshare.com/articles/dataset/Additional_file_2_-_datasets_and_scripts_for_metabolome_analysis/25684509
下载链接
链接失效反馈官方服务:
资源简介:
For the metabolome data, all calculations and statistical analyses were performed using Python. The Shapiro-Wilk test was performed to identify the metabolites whose concentrations in the blood showed a normal distribution, and Student’s t-test was used to compare their concentrations in blood samples for the IUGR and NORM groups. Metabolites whose concentrations did not show a normal distribution were compared between the two groups using the non-parametric Mann–Whitney test. The Benjamini–Hochberg correction was applied in both cases to account for the risk I inflation associated with multiple comparisons. Before being subjected to unsupervised and supervised algorithms, the concentration of each metabolite was normalised and centred. Principal component analysis (PCA) and orthogonal projection to latent structures-discriminant analysis (OPLS-DA) were employed as unsupervised and supervised methods in the multivariate analysis, respectively. PCA was used for the identification of outliers (Mahalanobis distance metric) as well as the spontaneous clustering of similar samples in the scatter plot of the two principal components. In the OPLS-DA analysis, the X matrix consisted of metabolite concentrations, while the Y vector contained information regarding the group (IUGR or NORM). The goodness of fit of the OPLS-DA model (R2Y) was reported, and predictive performance was assessed through cross-validation. Metrics such as the predictive ability of the model (Q2Y) and the predictive ability of permuted models (Q2Y-perm) were calculated for evaluation. OPLS-DA loading plots were used to illustrate the metabolites that contributed the most to the separation between the IUGR and NORM groups. The identification of metabolites of interest was made through the combination of the variable importance in the projection (VIP) and the loading between the metabolite in the X matrix and the predictive latent variable (pLV) of the model. Metabolites with VIP >1.0 and absolute high loading values were considered important in the metabolomics signature (De la Barca et al., 2022).References:Chao de la Barca JM, Chabrun F, Lefebvre T, Roche O, Huetz N, Blanchet O, Legendre G, Simard G, Reynier P, Gascoin G: A Metabolomic Profiling of Intra-Uterine Growth Restriction in Placenta and Cord Blood Points to an Impairment of Lipid and Energetic Metabolism. Biomedicines 2022, 10:1411.
针对代谢组数据(metabolome data),所有计算与统计分析均通过Python编程语言完成。首先采用夏皮罗-威尔克检验(Shapiro-Wilk test)筛选出血液浓度符合正态分布的代谢物,随后通过学生t检验(Student’s t-test)比较宫内生长受限组(intrauterine growth restriction, IUGR)与正常对照组(normal, NORM)的血液代谢物浓度差异。对于浓度不符合正态分布的代谢物,则采用非参数曼-惠特尼检验(Mann–Whitney test)进行两组间比较。两种检验均应用本雅明尼-霍赫贝格校正(Benjamini–Hochberg correction),以校正多重比较引发的Ⅰ型错误膨胀风险。在开展无监督与有监督算法分析前,需对所有代谢物浓度进行标准化与中心化(normalised and centred)处理。多变量分析分别采用主成分分析(Principal Component Analysis, PCA,无监督方法)与正交偏最小二乘判别分析(Orthogonal Projection to Latent Structures-Discriminant Analysis, OPLS-DA,有监督方法)。PCA用于识别异常值(采用马氏距离指标,Mahalanobis distance metric),并在两个主成分的散点图中实现相似样本的自发聚类。在OPLS-DA分析中,X矩阵由代谢物浓度构成,Y向量则包含分组信息(IUGR组或NORM组)。本研究报告了OPLS-DA模型的拟合优度(R²Y),并通过交叉验证评估其预测性能,计算了模型预测能力(Q²Y)与置换模型预测能力(Q²Y-perm)等指标用于模型评价。采用OPLS-DA载荷图可视化对IUGR组与NORM组分离贡献最大的代谢物。通过结合变量重要性投影(variable importance in the projection, VIP)与X矩阵中代谢物与模型预测潜变量(predictive latent variable, pLV)之间的载荷值,筛选目标代谢物。将VIP>1.0且绝对载荷值较高的代谢物视为代谢组学特征中的关键标志物(De la Barca et al., 2022)。参考文献:Chao de la Barca JM, Chabrun F, Lefebvre T, Roche O, Huetz N, Blanchet O, Legendre G, Simard G, Reynier P, Gascoin G. 胎盘与脐带血宫内生长受限的代谢组学特征分析提示脂质与能量代谢受损. 《生物医学》(Biomedicines)2022, 10:1411.
提供机构:
figshare
创建时间:
2024-04-29



