Table1_Completing Single-Cell DNA Methylome Profiles via Transfer Learning Together With KL-Divergence.XLSX
收藏frontiersin.figshare.com2023-06-14 更新2025-01-22 收录
下载链接:
https://frontiersin.figshare.com/articles/dataset/Table1_Completing_Single-Cell_DNA_Methylome_Profiles_via_Transfer_Learning_Together_With_KL-Divergence_XLSX/20354382/1
下载链接
链接失效反馈官方服务:
资源简介:
The high level of sparsity in methylome profiles obtained using whole-genome bisulfite sequencing in the case of low biological material amount limits its value in the study of systems in which large samples are difficult to assemble, such as mammalian preimplantation embryonic development. The recently developed computational methods for addressing the sparsity by imputing missing have their limits when the required minimum data coverage or profiles of the same tissue in other modalities are not available. In this study, we explored the use of transfer learning together with Kullback-Leibler (KL) divergence to train predictive models for completing methylome profiles with very low coverage (below 2%). Transfer learning was used to leverage less sparse profiles that are typically available for different tissues for the same species, while KL divergence was employed to maximize the usage of information carried in the input data. A deep neural network was adopted to extract both DNA sequence and local methylation patterns for imputation. Our study of training models for completing methylome profiles of bovine oocytes and early embryos demonstrates the effectiveness of transfer learning and KL divergence, with individual increase of 29.98 and 29.43%, respectively, in prediction performance and 38.70% increase when the two were used together. The drastically increased data coverage (43.80–73.6%) after imputation powers downstream analyses involving methylomes that cannot be effectively done using the very low coverage profiles (0.06–1.47%) before imputation.
在全基因组亚硫酸氢盐测序中,由于生物材料量低导致的甲基组轮廓高度稀疏,这在研究难以收集大量样本的系统(如哺乳动物早期胚胎发育)中限制了其价值。针对稀疏性问题,近年来发展出的通过插补缺失数据来解决的计算方法,在所需的最小数据覆盖率或同种组织在不同模态下的轮廓不可用的情况下,其局限性显而易见。在本研究中,我们探讨了利用迁移学习与Kullback-Leibler(KL)散度相结合,以训练预测模型来补充低覆盖率(低于2%)甲基组轮廓的方法。迁移学习被用于利用通常对不同组织可用的、稀疏度较低的同种物种轮廓,而KL散度则被用于最大化利用输入数据所携带的信息。采用深度神经网络提取DNA序列和局部甲基化模式以进行插补。我们针对补充牛卵母细胞和早期胚胎甲基组轮廓的模型训练研究,展示了迁移学习和KL散度的有效性,单独使用时分别提高了预测性能29.98%和29.43%,而当两者结合使用时,预测性能提高了38.70%。经过插补后的数据覆盖率显著提升(43.80–73.6%),为涉及甲基组的下游分析提供了强大的支持,这些分析在插补前使用极低覆盖率轮廓(0.06–1.47%)时无法有效进行。
提供机构:
Frontiers



