Synthetic Modeling of Emerging SARS-CoV-2 Recombinant Lineages via Markov Chain Mutation Forecasting
收藏DataCite Commons2025-07-08 更新2026-04-25 收录
下载链接:
https://figshare.com/articles/dataset/Synthetic_Modeling_of_Emerging_SARS-CoV-2_Recombinant_Lineages_via_Markov_Chain_Mutation_Forecasting/29504837/1
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains <b>18 synthetic full-length SARS-CoV-2 genomes</b> engineered using <b>Markov chain modeling</b> of Spike gene mutations derived from Nextclade TSV data. Each sequence represents a potential new PANGO lineage candidate based on unique mutation combinations not seen in current classifications.The Spike gene sequences were generated from mutation clusters and co-evolution patterns observed in recombinant <b>XFG-like variants</b>. These were back-translated and integrated into a Wuhan-Hu-1 reference backbone to create full-genome FASTA files.Also included are supporting metadata files:Mutation summary per genomeList of novel AA substitutionsThis work demonstrates how statistical modeling of mutation transitions can be used to simulate future lineages, enabling proactive genomic surveillance strategies.
本数据集包含**18条人工合成的全长新型冠状病毒(SARS-CoV-2)基因组**,其构建基于对从Nextclade的TSV格式数据(Nextclade TSV data)中提取的刺突(Spike)基因突变所开展的马尔可夫链建模(Markov chain modeling)。每条序列均基于现有分类体系中未出现的独特突变组合,代表潜在的新型PANGO谱系(PANGO lineage)候选株。刺突基因序列由重组类XFG变异株(XFG-like variants)中观测到的突变簇与共进化模式生成,随后经反向翻译并整合至Wuhan-Hu-1参考骨架(Wuhan-Hu-1 reference backbone),最终生成全基因组FASTA格式(FASTA)文件。本数据集同时附带配套元数据文件:单基因组突变汇总表、新型氨基酸替代(AA substitutions)列表。本研究证实,通过突变转移的统计建模可模拟未来可能出现的病毒谱系,从而为主动式基因组监测策略提供支撑。
提供机构:
figshare
创建时间:
2025-07-08



